DOCA Bench Sample Invocations
This guide provides examples of various invocations of the tool to help provide guidance and insight into it and the feature under test.
To make the samples clearer, certain verbose output and repeated information has been removed or shortened, in particular to output of the configuration or defaults when DOCA Bench is first executed is removed.
The command line options may need to be updated to suit your environment (e.g., TCP addresses, port numbers, interface names, usernames). See the "Command-line Parameters" section for more information.
This test invokes DOCA Bench to run in Ethernet receive mode, configured to receive Ethernet frames of size 1500 bytes.
The test runs for 3 seconds using a single core and use a maximum burst size of 512 frames.
The test runs in the default throughput mode, with throughput figures displayed at the end of the test run.
The companion application uses 6 cores to continuously transmit Ethernet frames of size 1500 bytes until it is stopped by DOCA Bench.
Command Line
doca_bench --core-mask 0x02
\
--pipeline-steps doca_eth::rx \
--device b1:00.1
\
--data-provider random-data \
--uniform-job-size 1500
\
--run-limit-seconds 3
\
--attribute doca_eth.max-burst-size=512
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6
\
--job-output-buffer-size 1500
\
--mtu-size raw_eth
Results Output
[main] doca_bench : 2.7
.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false
, doca_eth.max-burst-size:512
, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false
]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6
]
]
Pipelines: [
Steps: [
name: "doca_eth::rx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1
]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: ETH_FRAME
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
EAL: Detected CPU lcores: 36
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /run/user/48679
/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000
:b1:00.1
(socket 2
)
[08
:19
:32
:110524
][398304
][DOCA][WRN][engine_model.c:90
][adapt_queue_depth] adapting queue depth to 128
.
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000633
micro seconds
Enqueued jobs: 611215
Dequeued jobs: 611215
Throughput: 000.204
MOperations/s
Ingress rate: 002.276
Gib/s
Egress rate: 002.276
Gib/s
Results Overview
As a single core is specified, there is a single section of statistics output displayed.
This test invokes DOCA Bench to run in Ethernet send mode, configured to transmit Ethernet frames of size 1500 bytes
Random data is used to populate the Ethernet frames
The test runs for 3 seconds using a single core and uses a maximum burst size of 512 frames
L3 and L4 checksum offloading is not enabled
The test runs in the default throughput mode, with throughput figures displayed at the end of the test run
The companion application uses 6 cores to continuously receive Ethernet frames of size 1500 bytes until it is stopped by DOCA Bench
Command Line
doca_bench --core-mask 0x02
\
--pipeline-steps doca_eth::tx \
--device b1:00.1
\
--data-provider random-data \
--uniform-job-size 1500
\
--run-limit-seconds 3
\
--attribute doca_eth.max-burst-size=512
\
--attribute doca_eth.l4-chksum-offload=false
\
--attribute doca_eth.l3-chksum-offload=false
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6
\
--job-output-buffer-size 1500
Results Output
[main] doca_bench : 2.7
.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false
, doca_eth.max-burst-size:512
, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false
]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6
]
]
Pipelines: [
Steps: [
name: "doca_eth::tx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1
]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000049
micro seconds
Enqueued jobs: 17135128
Dequeued jobs: 17135128
Throughput: 005.712
MOperations/s
Ingress rate: 063.832
Gib/s
Egress rate: 063.832
Gib/s
Results Overview
As a single core is specified, there is a single section of statistics output displayed.
This test invokes DOCA Bench on the x86 host side to run the AES-GM Decryption step
A file-set file is used to indicate which file is to be decrypted. The content of the file-set file lists the filename to be decrypted.
The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key-file attribute. This contains the key to be used.
It will run until 5000 jobs have been processed
It runs in the precision-latency mode, with latency and throughput figures displayed at the end of the test run
A core mask is specified to indicate that cores 12, 13, 14, and 15 are to be used for this test
Command Line
doca_bench --mode precision-latency \
--core-mask 0xf000
\
--warm-up-jobs 32
\
--device 17
:00.0
\
--data-provider file-set \
--data-provider-input-file aes_64_128.fileset \
--run-limit-jobs 5000
\
--pipeline-steps doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key'
\
--job-output-buffer-size 80
Results Output
[main] Completed! tearing down...
Worker thread[0
](core: 12
) stats:
Duration: 10697
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467
MOperations/s
Ingress rate: 000.265
Gib/s
Egress rate: 000.223
Gib/s
Worker thread[1
](core: 13
) stats:
Duration: 10700
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467
MOperations/s
Ingress rate: 000.265
Gib/s
Egress rate: 000.223
Gib/s
Worker thread[2
](core: 14
) stats:
Duration: 10733
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.466
MOperations/s
Ingress rate: 000.264
Gib/s
Egress rate: 000.222
Gib/s
Worker thread[3
](core: 15
) stats:
Duration: 10788
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.463
MOperations/s
Ingress rate: 000.262
Gib/s
Egress rate: 000.221
Gib/s
Aggregate stats
Duration: 10788
micro seconds
Enqueued jobs: 20000
Dequeued jobs: 20000
Throughput: 001.854
MOperations/s
Ingress rate: 001.050
Gib/s
Egress rate: 000.884
Gib/s
min: 1878
ns
max: 4956
ns
median: 2134
ns
mean: 2145
ns
90th %ile: 2243
ns
95th %ile: 2285
ns
99th %ile: 2465
ns
99
.9th %ile: 3193
ns
99
.99th %ile: 4487
ns
Results Overview
Since a core mask is specified but no core count, then all cores in the mask are used.
There is a section of statistics displayed for each core used as well as the aggregate statistics.
This test invokes DOCA Bench on the BlueField side to run the AES-GM encryption step
A text file of size 2KB is the input for the encryption stage
The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key attribute
It runs until 2000 jobs have been processed
It runs in the bulk-latency mode, with latency and throughput figures displayed at the end of the test run
A single core is specified with 2 threads
Command Line
doca_bench --mode bulk-latency \
--core-list 3
\
--threads-per-core 2
\
--warm-up-jobs 32
\
--device 03
:00.0
\
--data-provider file \
--data-provider-input-file plaintext_2k.txt \
--run-limit-jobs 2000
\
--pipeline-steps doca_aes_gcm::encrypt \
--attribute doca_aes_gcm.key="0123456789abcdef0123456789abcdef"
\
--uniform-job-size 2048
\
--job-output-buffer-size 4096
Results Output
[main] Completed! tearing down...
Worker thread[0
](core: 3
) stats:
Duration: 501
micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.082
MOperations/s
Ingress rate: 062.279
Gib/s
Egress rate: 062.644
Gib/s
Worker thread[1
](core: 3
) stats:
Duration: 466
micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.386
MOperations/s
Ingress rate: 066.922
Gib/s
Egress rate: 067.314
Gib/s
Aggregate stats
Duration: 501
micro seconds
Enqueued jobs: 4096
Dequeued jobs: 4096
Throughput: 008.163
MOperations/s
Ingress rate: 124.558
Gib/s
Egress rate: 125.287
Gib/s
Latency report:
:
:
:
:
:
::
::
::
::
.::. . . ..
------------------------------------------------------------------------------------------------------
[<10000ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[26000ns -> 26999ns]: 0
[27000ns -> 27999ns]: 128
[28000ns -> 28999ns]: 2176
[29000ns -> 29999ns]: 1152
[30000ns -> 30999ns]: 128
[31000ns -> 31999ns]: 0
[32000ns -> 32999ns]: 0
[33000ns -> 33999ns]: 128
[34000ns -> 34999ns]: 0
[35000ns -> 35999ns]: 0
[36000ns -> 36999ns]: 0
[37000ns -> 37999ns]: 0
[38000ns -> 38999ns]: 128
[39000ns -> 39999ns]: 0
[40000ns -> 40999ns]: 0
[41000ns -> 41999ns]: 0
[42000ns -> 42999ns]: 0
[43000ns -> 43999ns]: 128
[44000ns -> 44999ns]: 128
[45000ns -> 45999ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[>110000ns]: 0
Results Overview
Since a single core is specified, there is a single section of statistics output displayed.
This test invokes DOCA Bench on the host side to run 2 AES-GM steps in the pipeline, first to encrypt a text file and then to decrypt the associated output from the encrypt step
A text file of size 2KB is the input for the encryption stage
The input-cwd option instructs DOCA Bench to look in a different location for the input file, in the parent directory in this case
The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key-file attribute, indicating that the key can be found in the specified file
It runs until 204800 bytes have been processed
It runs in the default throughput mode, with throughput figures displayed at the end of the test run
Command Line
doca_bench --core-mask 0xf00
\
--core-count 1
\
--warm-up-jobs 32
\
--device 17
:00.0
\
--data-provider file \
--input-cwd ../. \
--data-provider-input-file plaintext_2k.txt \
--run-limit-bytes 204800
\
--pipeline-steps doca_aes_gcm::encrypt,doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key'
\
--uniform-job-size 2048
\
--job-output-buffer-size 4096
Results Output
Executing...
Worker thread[0
](core: 8
) [doca_aes_gcm::encrypt>>doca_aes_gcm::decrypt] started...
Worker thread[0
] Executing 32
warm-up tasks using 32
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 79
micro seconds
Enqueued jobs: 214
Dequeued jobs: 214
Throughput: 002.701
MOperations/s
Ingress rate: 041.214
Gib/s
Egress rate: 041.214
Gib/s
Results Overview
Since a single core is specified, there is a single section of statistics output displayed.
This test invokes DOCA Bench on the host side to execute the SHA operation using the SHA256 algorithm and to create a CSV file containing the test configuration and statistics
A list of 1 core is provided with a count of 2 threads per core
Command Line
doca_bench --core-mask 2
\
--threads-per-core 2
\
--pipeline-steps doca_sha \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--attribute doca_sha.algorithm=sha256 \
--warm-up-jobs 100
\
--csv-output-file /tmp/sha_256_test.csv
Results Output
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 1
)
Duration: 3000064
micro seconds
Enqueued jobs: 3713935
Dequeued jobs: 3713935
Throughput: 001.238
MOperations/s
Ingress rate: 018.890
Gib/s
Egress rate: 000.295
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 3000056
micro seconds
Enqueued jobs: 3757335
Dequeued jobs: 3757335
Throughput: 001.252
MOperations/s
Ingress rate: 019.110
Gib/s
Egress rate: 000.299
Gib/s
Aggregate stats
Duration: 3000064
micro seconds
Enqueued jobs: 7471270
Dequeued jobs: 7471270
Throughput: 002.490
MOperations/s
Ingress rate: 038.000
Gib/s
Egress rate: 000.594
Gib/s
Results Overview
As a single core has been specified with a thread count of 2, there are statistics displayed for each thread as well as the aggregate statistics.
It can also be observed that 2 threads are started on core 1 with each thread executing the warm-up jobs.
The contents of the /tmp/sha_256_test.csv are shown below. It can be seen that the configuration used for the test and the associated statistics from the test run are listed:
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.r un_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attrib ute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,c fg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg. receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,sta ts.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[1
],throughput,0
,,,,,,,,sha256,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7471270
,7471270
,15301160960
,239109312
,038.000
Gib/s,000.594
Gib/s,2.490370
MOperations/s,2.490370
MOpera tions/s
This test invokes DOCA Bench on the Host side to execute the SHA operation using the SHA512 algorithm and to create a csv file containing the test configuration and statistics,
The command is repeated with the added option of csv-append-mode. This instructs DOCA Bench to append the test run statistics to the existing csv file.
A list of 1 core is provided with a count of 2 threads per core.
Command Line
Create the initial /tmp/sha_512_test.csv file:
doca_bench --core-list
2
\ --threads-per-core2
\ --pipeline-steps doca_sha \ --device d8:00.0
\ --data-provider random-data \ --uniform-job-size2048
\ --job-output-buffer-size2048
\ --run-limit-seconds3
\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100
\ --csv-output-file /tmp/sha_512_test.csvThe second command is:
./doca_bench --core-list
2
\ --threads-per-core2
\ --pipeline-steps doca_sha \ --device d8:00.0
\ --data-provider random-data \ --uniform-job-size2048
\ --job-output-buffer-size2048
\ --run-limit-seconds3
\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100
\ --csv-output-file /tmp/sha_512_test.csv \ --csv-append-modeThis causes DOCA Bench to append the configuration and statistics from the second command run to the /tmp/sha_512_test.csv file.
Results Output
This is a snapshot of the results output from the first command run:
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3015185
micro seconds
Enqueued jobs: 3590717
Dequeued jobs: 3590717
Throughput: 001.191
MOperations/s
Ingress rate: 018.171
Gib/s
Egress rate: 000.568
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000203
micro seconds
Enqueued jobs: 3656044
Dequeued jobs: 3656044
Throughput: 001.219
MOperations/s
Ingress rate: 018.594
Gib/s
Egress rate: 000.581
Gib/s
Aggregate stats
Duration: 3015185
micro seconds
Enqueued jobs: 7246761
Dequeued jobs: 7246761
Throughput: 002.403
MOperations/s
Ingress rate: 036.673
Gib/s
Egress rate: 001.146
Gib/s
This is a snapshot of the results output from the second command run:
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3000072
micro seconds
Enqueued jobs: 3602562
Dequeued jobs: 3602562
Throughput: 001.201
MOperations/s
Ingress rate: 018.323
Gib/s
Egress rate: 000.573
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000062
micro seconds
Enqueued jobs: 3659148
Dequeued jobs: 3659148
Throughput: 001.220
MOperations/s
Ingress rate: 018.611
Gib/s
Egress rate: 000.582
Gib/s
Aggregate stats
Duration: 3000072
micro seconds
Enqueued jobs: 7261710
Dequeued jobs: 7261710
Throughput: 002.421
MOperations/s
Ingress rate: 036.934
Gib/s
Egress rate: 001.154
Gib/s
Results Overview
Since a single core has been specified with a thread count of 2, there are statistics displayed for each thread as well as the aggregate statistics.
It can also be observed that 2 threads are started on core 1 with each thread executing the warm-up jobs.
The contents of the /tmp/sha_256_test.csv, after the first command has been run, are shown below. It can be seen that the configuration used for the test and the associated statistics from the test run are listed:
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7246761
,7246761
,14841366528
,463850048
,036.673
Gib/s,001.146
Gib/s,2.403422
MOperations/s,2.403422
MOperations/s
The contents of the /tmp/sha_256_test.csv, after the second command has been run, are shown below. It can be seen that a second entry has been added detailing the configuration used for the test and the associated statistics from the test run:
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7246761
,7246761
,14841366528
,463850048
,036.673
Gib/s,001.146
Gib/s,2.403422
MOperations/s,2.403422
MOperations/s
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7261710
,7261710
,14871982080
,464806784
,036.934
Gib/s,001.154
Gib/s,2.420512
MOperations/s,2.420512
MOperations/s
This test invokes DOCA Bench on the BlueField side to execute the SHA operation using the SHA1 algorithm and to display statistics every 2000 milliseconds during the test run
A list of 3 cores is provided with a count of 2 threads per core and a core-count of 1
The core-count instructs DOCA Bench to use the first core number in the core list, in this case core number 2
Command Line
doca_bench --core-list 2
,3
,4
\
--core-count 1
\
--threads-per-core 2
\
--pipeline-steps doca_sha \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
-attribute doca_sha.algorithm=sha1 \
--warm-up-jobs 100
\
--rt-stats-interval 2000
Results Output
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Stats for
thread[0
](core: 2
)
Duration: 965645
micro seconds
Enqueued jobs: 1171228
Dequeued jobs: 1171228
Throughput: 001.213
MOperations/s
Ingress rate: 018.505
Gib/s
Egress rate: 000.181
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 965645
micro seconds
Enqueued jobs: 1171754
Dequeued jobs: 1171754
Throughput: 001.213
MOperations/s
Ingress rate: 018.514
Gib/s
Egress rate: 000.181
Gib/s
Aggregate stats
Duration: 965645
micro seconds
Enqueued jobs: 2342982
Dequeued jobs: 2342982
Throughput: 002.426
MOperations/s
Ingress rate: 037.019
Gib/s
Egress rate: 000.362
Gib/s
Stats for
thread[0
](core: 2
)
Duration: 2968088
micro seconds
Enqueued jobs: 3653691
Dequeued jobs: 3653691
Throughput: 001.231
MOperations/s
Ingress rate: 018.783
Gib/s
Egress rate: 000.183
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 2968088
micro seconds
Enqueued jobs: 3689198
Dequeued jobs: 3689198
Throughput: 001.243
MOperations/s
Ingress rate: 018.965
Gib/s
Egress rate: 000.185
Gib/s
Aggregate stats
Duration: 2968088
micro seconds
Enqueued jobs: 7342889
Dequeued jobs: 7342889
Throughput: 002.474
MOperations/s
Ingress rate: 037.748
Gib/s
Egress rate: 000.369
Gib/s
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3000122
micro seconds
Enqueued jobs: 3694128
Dequeued jobs: 3694128
Throughput: 001.231
MOperations/s
Ingress rate: 018.789
Gib/s
Egress rate: 000.184
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000089
micro seconds
Enqueued jobs: 3751128
Dequeued jobs: 3751128
Throughput: 001.250
MOperations/s
Ingress rate: 019.079
Gib/s
Egress rate: 000.186
Gib/s
Aggregate stats
Duration: 3000122
micro seconds
Enqueued jobs: 7445256
Dequeued jobs: 7445256
Throughput: 002.482
MOperations/s
Ingress rate: 037.867
Gib/s
Egress rate: 000.370
Gib/s
Results Overview
Although a core list of 3 cores has been specified, the core-count value of 1 instructs DOCA Bench to use the first entry in the core list.
It can be seen that as a thread-count of 2 has been specified, there are 2 threads created on core 2.
A transient statistics interval of 2000 milliseconds has been specified, and the transient statistics per thread can be seen, as well as the final aggregate statistics.
This test invokes DOCA Bench to execute a local DMA operation on the host
It specifies that a core sweep should be carried out using core counts of 1, 2, and 4 using the option --sweep core-count,1,4,*2
Test output is to be saved in a CSV file /tmp/dma_sweep.csv and a filter is applied so that only statistics information is recorded. No configuration information is to be recorded.
Command Line
doca_bench --core-mask 0xff
\
--sweep core-count,1
,4
,*2
\
--pipeline-steps doca_dma \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 5
\
--csv-output-file /tmp/dma_sweep.csv \
--csv-stats "stats.*"
Results Overview
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 2
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 4
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1
of 3
...
Executing permutation 1
of 3
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 1
of 3
...
Aggregate stats
Duration: 5000191
micro seconds
Enqueued jobs: 22999128
Dequeued jobs: 22999128
Throughput: 004.600
MOperations/s
Ingress rate: 070.185
Gib/s
Egress rate: 070.185
Gib/s
Preparing permutation 2
of 3
...
Executing permutation 2
of 3
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 2
of 3
...
Stats for
thread[0
](core: 0
)
Duration: 5000066
micro seconds
Enqueued jobs: 14409794
Dequeued jobs: 14409794
Throughput: 002.882
MOperations/s
Ingress rate: 043.975
Gib/s
Egress rate: 043.975
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 5000188
micro seconds
Enqueued jobs: 14404708
Dequeued jobs: 14404708
Throughput: 002.881
MOperations/s
Ingress rate: 043.958
Gib/s
Egress rate: 043.958
Gib/s
Aggregate stats
Duration: 5000188
micro seconds
Enqueued jobs: 28814502
Dequeued jobs: 28814502
Throughput: 005.763
MOperations/s
Ingress rate: 087.932
Gib/s
Egress rate: 087.932
Gib/s
Preparing permutation 3
of 3
...
Executing permutation 3
of 3
...
Data path thread [1
] started...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [3
] started...
WT[3
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [2
] started...
WT[2
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 3
of 3
...
[main] Completed! tearing down...
Stats for
thread[0
](core: 0
)
Duration: 5000092
micro seconds
Enqueued jobs: 7227025
Dequeued jobs: 7227025
Throughput: 001.445
MOperations/s
Ingress rate: 022.055
Gib/s
Egress rate: 022.055
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 5000081
micro seconds
Enqueued jobs: 7223269
Dequeued jobs: 7223269
Throughput: 001.445
MOperations/s
Ingress rate: 022.043
Gib/s
Egress rate: 022.043
Gib/s
Stats for
thread[2
](core: 2
)
Duration: 5000047
micro seconds
Enqueued jobs: 7229678
Dequeued jobs: 7229678
Throughput: 001.446
MOperations/s
Ingress rate: 022.063
Gib/s
Egress rate: 022.063
Gib/s
Stats for
thread[3
](core: 3
)
Duration: 5000056
micro seconds
Enqueued jobs: 7223037
Dequeued jobs: 7223037
Throughput: 001.445
MOperations/s
Ingress rate: 022.043
Gib/s
Egress rate: 022.043
Gib/s
Aggregate stats
Duration: 5000092
micro seconds
Enqueued jobs: 28903009
Dequeued jobs: 28903009
Throughput: 005.780
MOperations/s
Ingress rate: 088.203
Gib/s
Egress rate: 088.203
Gib/s
Results Overview
The output gives a summary of the permutations being carried out and then proceeds to display the statistics for each of the permutations.
The CSV output file contents can be seen to contain only statistics information. Configuration information is not included.
There is an entry for each of the sweep permutations:
stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
22999128
,22999128
,47102214144
,47102214144
,070.185
Gib/s,070.185
Gib/s,4.599650
MOperations/s,4.599650
MOperations/s
28814502
,28814502
,59012100096
,59012100096
,087.932
Gib/s,087.932
Gib/s,5.762683
MOperations/s,5.762683
MOperations/s
28903009
,28903009
,59193362432
,59193362432
,088.203
Gib/s,088.203
Gib/s,5.780495
MOperations/s,5.780495
MOperations/s
This test invokes DOCA Bench to execute a local DMA operation on the host.
It specifies that a uniform job size sweep should be carried out using job sizes 1024 and 2048 using the option --sweep uniform-job-size,1024,2048.
Test output is to be saved in a CSV file /tmp/dma_sweep_job_size.csv and collection of environment information is enabled.
Command Line
doca_bench --core-mask 0xff
\
--core-count 1
\
--pipeline-steps doca_dma \
--device d8:00.0
\
--data-provider random-data \
--sweep uniform-job-size,1024
,2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 5
\
--csv-output-file /tmp/dma_sweep_job_size.csv \
--enable-environment-information
Results Overview
Test permutations: [
Attributes: []
Uniform job size: 1024
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1
of 2
...
Executing permutation 1
of 2
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 1
of 2
...
Aggregate stats
Duration: 5000083
micro seconds
Enqueued jobs: 23645128
Dequeued jobs: 23645128
Throughput: 004.729
MOperations/s
Ingress rate: 036.079
Gib/s
Egress rate: 036.079
Gib/s
Preparing permutation 2
of 2
...
Executing permutation 2
of 2
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 2
of 2
...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000027
micro seconds
Enqueued jobs: 22963128
Dequeued jobs: 22963128
Throughput: 004.593
MOperations/s
Ingress rate: 070.078
Gib/s
Egress rate: 070.078
Gib/s
Results Overview
The output gives a summary of the permutations being carried out and then proceeds to display the statistics for each of the permutations.
The CSV output file contents can be seen to contain statistics information and the environment information.
There is an entry for each of the sweep permutations.
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate,host.pci.3
.address,host.pci.3
.ext_tag,host.pci.3
.link_type,host.pci.2
.ext_tag,host.pci.2
.address,host.cpu.0
.model,host.ofed_version,host.pci.4
.max_read_request,host.pci.2
.width,host.cpu.1
.logical_cores,host.pci.2
.eswitch_mode,host.pci.3
.max_read_request,host.pci.4
.address,host.pci.2
.link_type,host.pci.1
.max_read_request,host.pci.4
.link_type,host.cpu.socket_count,host.pci.0
.ext_tag,host.pci.6
.port_speed,host.cpu.0
.physical_cores,host.pci.7
.port_speed,host.memory.dimm_slot_count,host.cpu.1
.model,host.pci.0
.max_payload_size,host.pci.6
.relaxed_ordering,host.doca_host_package_version,host.pci.6
.max_payload_size,host.pci.0
.gen,host.pci.4
.width,host.pci.2
.gen,host.pci.1
.max_payload_size,host.pci.4
.relaxed_ordering,host.pci.3
.width,host.cpu.0
.logical_cores,host.cpu.0
.arch,host.pci.4
.port_speed,host.pci.4
.eswitch_mode,host.pci.7
.address,host.pci.5
.eswitch_mode,host.pci.5
.address,host.cpu.1
.arch,host.pci.0
.eswitch_mode,host.pci.7
.width,host.pci.7
.link_type,host.pci.1
.link_type,host.pci.3
.gen,host.pci.7
.max_read_request,host.pci.7
.eswitch_mode,host.pci.6
.gen,host.pci.2
.port_speed,host.pci.7
.gen,host.pci.2
.relaxed_ordering,host.pci.6
.width,host.pci.4
.gen,host.pci.6
.address,host.hostname,host.pci.5
.link_type,host.pci.6
.link_type,host.pci.6
.max_read_request,host.pci.7
.max_payload_size,host.pci.5
.gen,host.pci.6
.eswitch_mode,host.pci.5
.width,host.pci.3
.relaxed_ordering,host.pci.4
.ext_tag,host.pci.0
.width,host.pci.5
.port_speed,host.pci.2
.max_payload_size,host.pci.3
.max_payload_size,host.pci.5
.max_payload_size,host.pci.2
.max_read_request,host.pci.0
.address,host.pci.gen,host.os.family,host.pci.1
.gen,host.pci.5
.relaxed_ordering,host.pci.1
.port_speed,host.pci.7
.ext_tag,host.pci.1
.address,host.pci.3
.eswitch_mode,host.pci.3
.port_speed,host.pci.0
.max_read_request,host.pci.1
.ext_tag,host.pci.0
.relaxed_ordering,host.pci.0
.link_type,host.pci.5
.max_read_request,host.pci.4
.max_payload_size,host.pci.device_count,host.memory.populated_dimm_count,host.memory.installed_capacity,host.pci.6
.ext_tag,host.os.kernel_version,host.pci.0
.port_speed,host.pci.1
.width,host.pci.7
.relaxed_ordering,host.pci.1
.relaxed_ordering,host.os.version,host.os.name,host.cpu.1
.physical_cores,host.numa_node_count,host.pci.5
.ext_tag,host.pci.1
.eswitch_mode
,[doca_dma],0
,0
,10000
,1000
,5
,,,random-data,2048
,d8:00.0
,,,100
,"[0, 1, 2, 3, 4, 5, 6, 7]"
,throughput,0
,,,,,,,,,1024
,1
,1
,1024
,128
,1
fragments,,,,,,,23645128
,23645128
,24212611072
,24212611072
,036.079
Gib/s,036.079
Gib/s,4.728947
MOperations/s,4.728947
MOperations/s,0000
:5e:00.1
,true
,Infiniband,true
,0000
:5e:00.0
,N/A,OFED-internal-24.04
-0.4
.8
,N/A,x63,N/A,N/A,N/A,0000
:af:00.0
,Infiniband,N/A,Ethernet,2
,true
,N/A,N/A,N/A,N/A,N/A,N/A,true
,<none>,N/A,Gen15,x63,Gen15,N/A,true
,x63,N/A,x86_64,104857600000
,N/A,0000
:d8:00.1
,N/A,0000
:af:00.1
,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true
,x63,Gen15,0000
:d8:00.0
,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true
,true
,x63,104857600000
,N/A,N/A,N/A,N/A,0000
:3b:00.0
,N/A,Linux,Gen15,true
,N/A,true
,0000
:3b:00.1
,N/A,N/A,N/A,true
,true
,Infiniband,N/A,N/A,8
,N/A,270049112064
,true
,5.4
.0
-174
-generic,N/A,x63,true
,true
,20.04
.1
LTS (Focal Fossa),Ubuntu,N/A,2
,true
,N/A
,[doca_dma],0
,0
,10000
,1000
,5
,,,random-data,2048
,d8:00.0
,,,100
,"[0, 1, 2, 3, 4, 5, 6, 7]"
,throughput,0
,,,,,,,,,2048
,1
,1
,1024
,128
,1
fragments,,,,,,,22963128
,22963128
,47028486144
,47028486144
,070.078
Gib/s,070.078
Gib/s,4.592600
MOperations/s,4.592600
MOperations/s,0000
:5e:00.1
,true
,Infiniband,true
,0000
:5e:00.0
,N/A,OFED-internal-24.04
-0.4
.8
,N/A,x63,N/A,N/A,N/A,0000
:af:00.0
,Infiniband,N/A,Ethernet,2
,true
,N/A,N/A,N/A,N/A,N/A,N/A,true
,<none>,N/A,Gen15,x63,Gen15,N/A,true
,x63,N/A,x86_64,104857600000
,N/A,0000
:d8:00.1
,N/A,0000
:af:00.1
,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true
,x63,Gen15,0000
:d8:00.0
,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true
,true
,x63,104857600000
,N/A,N/A,N/A,N/A,0000
:3b:00.0
,N/A,Linux,Gen15,true
,N/A,true
,0000
:3b:00.1
,N/A,N/A,N/A,true
,true
,Infiniband,N/A,N/A,8
,N/A,270049112064
,true
,5.4
.0
-174
-generic,N/A,x63,true
,true
,20.04
.1
LTS (Focal Fossa),Ubuntu,N/A,2
,true
,N/A
This test invokes DOCA Bench to execute a remote DMA operation on the host
It specifies the companion connection details to be used on the host and that remote output buffers are to be used
Command Line
doca_bench --core-list 12
\
--pipeline-steps doca_dma \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345
,mode=host,dev=17
:00.0
,user=bob,addr=10.10
.10.10
\
--run-limit-seconds 5
Results Overview
Executing...
Worker thread[0
](core: 12
) [doca_dma] started...
Worker thread[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000073
micro seconds
Enqueued jobs: 32202128
Dequeued jobs: 32202128
Throughput: 006.440
MOperations/s
Ingress rate: 098.272
Gib/s
Egress rate: 098.272
Gib/s
Results Overview
None.
This test is relevant for BlueField-2 only.
This test invokes DOCA Bench to run compression using random data as input
The compression algorithm specified is "deflate"
Command Line
doca_bench --core-list 2
\
--pipeline-steps doca_compress::compress \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 4096
\
--run-limit-seconds 3
\
--attribute doca_compress.algorithm="deflate"
Result Output
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000146
micro seconds
Enqueued jobs: 5340128
Dequeued jobs: 5340128
Throughput: 001.780
MOperations/s
Ingress rate: 027.160
Gib/s
Egress rate: 027.748
Gib/s
Results Overview
None
This test invokes DOCA Bench to run decompression using random data as input
This test specifies a data provider of file set which contains the filename of an LZ4 compressed file
Remote input buffers are specified to be used for the input jobs
It specifies the companion connection details to be used on the host for the remote input buffers
Command Line
doca_bench --core-list 12
\
--pipeline-steps doca_compress::decompress \
--device 03
:00.0
\
--data-provider file-set \
--data-provider-input-file lz4_compressed_64b_buffers.fs \
--job-output-buffer-size 4096
\
--run-limit-seconds 3
\
--attribute doca_compress.algorithm="lz4"
\
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345
,mode=host,dev=17
:00.0
,user=bob,addr=10.10
.10.10
Results Output
Executing...
Worker thread[0
](core: 12
) [doca_compress::decompress] started...
Worker thread[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000043
micro seconds
Enqueued jobs: 15306128
Dequeued jobs: 15306128
Throughput: 005.102
MOperations/s
Ingress rate: 003.155
Gib/s
Egress rate: 002.433
Gib/s
Results Comment
None
This test invokes DOCA Bench to run the EC creation step.
It runs in bulk latency mode and specifies the doca_ec attributes of data_block_count, redundancy_block_count, and matrix_type
Command Line
doca_bench --mode bulk-latency \
--core-list 12
\
--pipeline-steps doca_ec::create \
--device 17
:00.0
\
--data-provider random-data \
--uniform-job-size 1024
\
--job-output-buffer-size 1024
\
--run-limit-seconds 3
\
--attribute doca_ec.data_block_count=16
\
--attribute doca_ec.redundancy_block_count=16
\
--attribute doca_ec.matrix_type=cauchy
Results Output
Bulk latency output will be similar to that presented in section "BlueField-side Decompress LZ4 Sample".
Results Comment
Bulk latency output will be similar to that presented earlier on this page.
This test invokes DOCA Bench to run the EC creation step
It runs in precision latency mode and specifies the doca_ec attributes of data_block_count, redundancy_block_count, and matrix_type
Command Line
doca_bench --mode precision-latency \
--core-list 12
\
--pipeline-steps doca_ec::create \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 1024
\
--job-output-buffer-size 1024
\
--run-limit-jobs 5000
\
--attribute doca_ec.data_block_count=16
\
--attribute doca_ec.redundancy_block_count=16
\
--attribute doca_ec.matrix_type=cauchy
Results Output
None
Results Comment
Precision latency output will be similar to that presented earlier on this page.
This test invokes DOCA Bench in Comch consumer mode using a core-list on host side and BlueField side
The run-limit is 500 jobs
Command Line
./doca_bench --core-list 4
--warm-up-jobs 32
--pipeline-steps doca_comch::consumer --device ca:00.0
--data-provider random-data --run-limit-jobs 500
--core-count 1
--uniform-job-size 4096
--job-output-buffer-size 4096
--companion-connection-string proto=tcp,mode=dpu,dev=03
:00.0
,user=bob,addr=10.10
.10.10
,port=12345
--attribute dopt.companion_app.path=<path to DPU doca_bench_companion application location> --data-provider-job-count 256
--companion-core-list 12
Results Output
[main] Completed! tearing down...
Aggregate stats
Duration: 1415
micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 000.353
MOperations/s
Ingress rate: 000.000
Gib/s
Egress rate: 010.782
Gib/s
Results Comment
The aggregate statistics show the test completed after 500 jobs were processed.
This test invokes DOCA Bench in Comch producer mode using a core-mask on the host side and BlueField side
The run-limit is 1000 jobs
Command Line
doca_bench --core-list 4
\
--warm-up-jobs 32
\
--pipeline-steps doca_comch::producer \
--device ca:00.0
\
--data-provider random-data \
--run-limit-jobs 500
\
--core-count 1
\
--uniform-job-size 4096
\
--job-output-buffer-size 4096
\
--companion-connection-string proto=tcp,mode=dpu,dev=03
:00.0
,user=bob,addr=10.10
.10.10
,port=12345
\
--attribute dopt.companion_app.path=<path to DPU doca_bench_companion location> \
--data-provider-job-count 256
\
--companion-core-list 12
Results Overview
[main] Completed! tearing down...
Aggregate stats
Duration: 407
micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 001.226
MOperations/s
Ingress rate: 037.402
Gib/s
Egress rate: 000.000
Gib/s
Results Comment
The aggregate statistics show the test completed after 500 jobs were processed.
This test invokes DOCA Bench in RDMA send mode using a core-list on the send and receive side
The send queue size is configured to 50 entries
Command Line
doca_bench --pipeline-steps doca_rdma::send \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--send-queue-size 50
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ca:00.0
\
--companion-core-list 12
\
--core-list 12
Results Output
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: 50
RQ depth: -- not configured --
Input data file: -- not configured --
]
Results Comment
The configuration output shows the send queue size configured to 50.
This test invokes DOCA Bench in RDMA receive mode using a core-list on the send and receive side
The receive queue size is configured to 100 entries
Command Line
doca_bench --pipeline-steps doca_rdma::receive \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--receive-queue-size 100
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ca:00.0
\
--companion-core-list 12
\
--core-list 12
Results Output
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: 100
Input data file: -- not configured --
]
Results Overview
The configuration output shows the receive queue size configured to 100.