DOCA Bench Sample Invocations

1.0

On This Page

This guide provides examples of various invocations of the tool to help provide guidance and insight into it and the feature under test.

Info

To make the samples clearer, certain verbose output and repeated information has been removed or shortened, in particular to output of the configuration or defaults when DOCA Bench is first executed is removed.

Info

The command line options may need to be updated to suit your environment (e.g., TCP addresses, port numbers, interface names, usernames). See the “Command-line Parameters” section for more information.

  • This test invokes DOCA Bench to run in Ethernet receive mode, configured to receive Ethernet frames of size 1500 bytes.

  • The test runs for 3 seconds using a single core and use a maximum burst size of 512 frames.

  • The test runs in the default throughput mode, with throughput figures displayed at the end of the test run.

  • The companion application uses 6 cores to continuously transmit Ethernet frames of size 1500 bytes until it is stopped by DOCA Bench.

Command Line

Copy
Copied!
            

doca_bench --core-mask 0x02 \                                 --pipeline-steps doca_eth::rx \                                     --device b1:00.1 \                                     --data-provider random-data \                                     --uniform-job-size 1500 \                                     --run-limit-seconds 3 \                                     --attribute doca_eth.max-burst-size=512 \                                     --companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ens4f1np1 \                                     --attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \                                     --companion-core-list 6 \                                     --job-output-buffer-size 1500 \                                     --mtu-size raw_eth


Results Output

Copy
Copied!
            

[main] doca_bench : 2.7.0084 [main] release build + + + + + + + + + + + + + + + + + + + + + + + + + + DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm] + + + + + + + + + + + + + + + + + + + + + + + + + +   DOCA bench configuration Static configuration: [         Attributes: [doca_eth.l4-chksum-offload:false, doca_eth.max-burst-size:512, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false]         Companion configuration: [                 Device: ens4f1np1                 Remote IP address: "bob@10.10.10.10"                 Core set: [6]         ]         Pipelines: [                 Steps: [                         name: "doca_eth::rx"                         attributes: []                 ]                 Use remote input buffers: no                 Use remote output buffers: no                 Latency bucket_range: 10000ns-110000ns         ]         Run limits: [                 Max execution time: 3seconds                 Max jobs executed: -- not configured --                 Max bytes processed: -- not configured --         ]         Data provider: [                 Name: "random-data"                 Job output buffer size: 1500         ]         Device: "b1:00.1"         Device representor: "-- not configured --"         Warm up job count: 100         Input files dir: "-- not configured --"         Output files dir: "-- not configured --"         Core set: [1]         Benchmark mode: throughput         Warnings as errors: no         CSV output: [                 File name: -- not configured --                 Selected stats: []                 Deselected stats: []                 Separate dynamic values: no                 Collect environment information: no                 Append to stats file: no         ] ] Test permutations: [         Attributes: []         Uniform job size: 1500         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: ETH_FRAME         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured -- ]   [main] Initialize framework... [main] Start execution... Preparing... EAL: Detected CPU lcores: 36 EAL: Detected NUMA nodes: 4 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /run/user/48679/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: VFIO support initialized TELEMETRY: No legacy callbacks, legacy socket not created EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:b1:00.1 (socket 2) [08:19:32:110524][398304][DOCA][WRN][engine_model.c:90][adapt_queue_depth] adapting queue depth to 128. Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      3000633 micro seconds         Enqueued jobs: 611215         Dequeued jobs: 611215         Throughput:    000.204 MOperations/s         Ingress rate:  002.276 Gib/s         Egress rate:   002.276 Gib/s


Results Overview

As a single core is specified, there is a single section of statistics output displayed.

  • This test invokes DOCA Bench to run in Ethernet send mode, configured to transmit Ethernet frames of size 1500 bytes

  • Random data is used to populate the Ethernet frames

  • The test runs for 3 seconds using a single core and uses a maximum burst size of 512 frames

  • L3 and L4 checksum offloading is not enabled

  • The test runs in the default throughput mode, with throughput figures displayed at the end of the test run

  • The companion application uses 6 cores to continuously receive Ethernet frames of size 1500 bytes until it is stopped by DOCA Bench

Command Line

Copy
Copied!
            

doca_bench --core-mask 0x02 \                                     --pipeline-steps doca_eth::tx \                                     --device b1:00.1 \                                     --data-provider random-data \                                     --uniform-job-size 1500 \                                     --run-limit-seconds 3 \                                     --attribute doca_eth.max-burst-size=512 \                                     --attribute doca_eth.l4-chksum-offload=false \                                     --attribute doca_eth.l3-chksum-offload=false \                                     --companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ens4f1np1 \                                     --attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \                                     --companion-core-list 6 \                                     --job-output-buffer-size 1500


Results Output

Copy
Copied!
            

[main] doca_bench : 2.7.0084 [main] release build + + + + + + + + + + + + + + + + + + + + + + + + + + DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm] + + + + + + + + + + + + + + + + + + + + + + + + + +   DOCA bench configuration Static configuration: [         Attributes: [doca_eth.l4-chksum-offload:false, doca_eth.max-burst-size:512, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false]         Companion configuration: [                 Device: ens4f1np1                 Remote IP address: "bob@10.10.10.10"                 Core set: [6]         ]         Pipelines: [                 Steps: [                         name: "doca_eth::tx"                         attributes: []                 ]                 Use remote input buffers: no                 Use remote output buffers: no                 Latency bucket_range: 10000ns-110000ns         ]         Run limits: [                 Max execution time: 3seconds                 Max jobs executed: -- not configured --                 Max bytes processed: -- not configured --         ]         Data provider: [                 Name: "random-data"                 Job output buffer size: 1500         ]         Device: "b1:00.1"         Device representor: "-- not configured --"         Warm up job count: 100         Input files dir: "-- not configured --"         Output files dir: "-- not configured --"         Core set: [1]         Benchmark mode: throughput         Warnings as errors: no         CSV output: [                 File name: -- not configured --                 Selected stats: []                 Deselected stats: []                 Separate dynamic values: no                 Collect environment information: no                 Append to stats file: no         ] ] Test permutations: [         Attributes: []         Uniform job size: 1500         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured -- ]   [main] Initialize framework... [main] Start execution... Preparing... Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      3000049 micro seconds         Enqueued jobs: 17135128         Dequeued jobs: 17135128         Throughput:    005.712 MOperations/s         Ingress rate:  063.832 Gib/s         Egress rate:   063.832 Gib/s


Results Overview

As a single core is specified, there is a single section of statistics output displayed.

  • This test invokes DOCA Bench on the x86 host side to run the AES-GM Decryption step

  • A file-set file is used to indicate which file is to be decrypted. The content of the file-set file lists the filename to be decrypted.

  • The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key-file attribute. This contains the key to be used.

  • It will run until 5000 jobs have been processed

  • It runs in the precision-latency mode, with latency and throughput figures displayed at the end of the test run

  • A core mask is specified to indicate that cores 12, 13, 14, and 15 are to be used for this test

Command Line

Copy
Copied!
            

doca_bench  --mode precision-latency \             --core-mask 0xf000 \             --warm-up-jobs 32 \             --device 17:00.0 \             --data-provider file-set \             --data-provider-input-file aes_64_128.fileset \             --run-limit-jobs 5000 \             --pipeline-steps doca_aes_gcm::decrypt \             --attribute doca_aes_gcm.key-file='aes128.key' \             --job-output-buffer-size 80


Results Output

Copy
Copied!
            

[main] Completed! tearing down... Worker thread[0](core: 12) stats:         Duration:      10697 micro seconds         Enqueued jobs: 5000         Dequeued jobs: 5000         Throughput:    000.467 MOperations/s         Ingress rate:  000.265 Gib/s         Egress rate:   000.223 Gib/s Worker thread[1](core: 13) stats:         Duration:      10700 micro seconds         Enqueued jobs: 5000         Dequeued jobs: 5000         Throughput:    000.467 MOperations/s         Ingress rate:  000.265 Gib/s         Egress rate:   000.223 Gib/s Worker thread[2](core: 14) stats:         Duration:      10733 micro seconds         Enqueued jobs: 5000         Dequeued jobs: 5000         Throughput:    000.466 MOperations/s         Ingress rate:  000.264 Gib/s         Egress rate:   000.222 Gib/s Worker thread[3](core: 15) stats:         Duration:      10788 micro seconds         Enqueued jobs: 5000         Dequeued jobs: 5000         Throughput:    000.463 MOperations/s         Ingress rate:  000.262 Gib/s         Egress rate:   000.221 Gib/s Aggregate stats         Duration:      10788 micro seconds         Enqueued jobs: 20000         Dequeued jobs: 20000         Throughput:    001.854 MOperations/s         Ingress rate:  001.050 Gib/s         Egress rate:   000.884 Gib/s         min:           1878 ns         max:           4956 ns         median:        2134 ns         mean:          2145 ns         90th %ile:     2243 ns         95th %ile:     2285 ns         99th %ile:     2465 ns         99.9th %ile:   3193 ns         99.99th %ile:  4487 ns


Results Overview

Since a core mask is specified but no core count, then all cores in the mask are used.

There is a section of statistics displayed for each core used as well as the aggregate statistics.

  • This test invokes DOCA Bench on the BlueField side to run the AES-GM encryption step

  • A text file of size 2KB is the input for the encryption stage

  • The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key attribute

  • It runs until 2000 jobs have been processed

  • It runs in the bulk-latency mode, with latency and throughput figures displayed at the end of the test run

  • A single core is specified with 2 threads

Command Line

Copy
Copied!
            

doca_bench  --mode bulk-latency \             --core-list 3 \             --threads-per-core 2 \             --warm-up-jobs 32 \             --device 03:00.0 \             --data-provider file \             --data-provider-input-file plaintext_2k.txt \             --run-limit-jobs 2000 \             --pipeline-steps doca_aes_gcm::encrypt \             --attribute doca_aes_gcm.key="0123456789abcdef0123456789abcdef" \             --uniform-job-size 2048 \             --job-output-buffer-size 4096


Results Output

Copy
Copied!
            

[main] Completed! tearing down... Worker thread[0](core: 3) stats:         Duration:      501 micro seconds         Enqueued jobs: 2048         Dequeued jobs: 2048         Throughput:    004.082 MOperations/s         Ingress rate:  062.279 Gib/s         Egress rate:   062.644 Gib/s Worker thread[1](core: 3) stats:         Duration:      466 micro seconds         Enqueued jobs: 2048         Dequeued jobs: 2048         Throughput:    004.386 MOperations/s         Ingress rate:  066.922 Gib/s         Egress rate:   067.314 Gib/s Aggregate stats         Duration:      501 micro seconds         Enqueued jobs: 4096         Dequeued jobs: 4096         Throughput:    008.163 MOperations/s         Ingress rate:  124.558 Gib/s         Egress rate:   125.287 Gib/s Latency report:                    :                    :                    :                    :                    :                    ::                    ::                    ::                    ::                   .::.  .    .    .. ------------------------------------------------------------------------------------------------------ [<10000ns]: 0 .. OUTPUT RETRACTED (SHORTENED) .. [26000ns -> 26999ns]: 0 [27000ns -> 27999ns]: 128 [28000ns -> 28999ns]: 2176 [29000ns -> 29999ns]: 1152 [30000ns -> 30999ns]: 128 [31000ns -> 31999ns]: 0 [32000ns -> 32999ns]: 0 [33000ns -> 33999ns]: 128 [34000ns -> 34999ns]: 0 [35000ns -> 35999ns]: 0 [36000ns -> 36999ns]: 0 [37000ns -> 37999ns]: 0 [38000ns -> 38999ns]: 128 [39000ns -> 39999ns]: 0 [40000ns -> 40999ns]: 0 [41000ns -> 41999ns]: 0 [42000ns -> 42999ns]: 0 [43000ns -> 43999ns]: 128 [44000ns -> 44999ns]: 128 [45000ns -> 45999ns]: 0 .. OUTPUT RETRACTED (SHORTENED) .. [>110000ns]: 0


Results Overview

Since a single core is specified, there is a single section of statistics output displayed.

  • This test invokes DOCA Bench on the host side to run 2 AES-GM steps in the pipeline, first to encrypt a text file and then to decrypt the associated output from the encrypt step

  • A text file of size 2KB is the input for the encryption stage

  • The input-cwd option instructs DOCA Bench to look in a different location for the input file, in the parent directory in this case

  • The key to be used for the encryption and decryption is specified using the doca_aes_gcm.key-file attribute, indicating that the key can be found in the specified file

  • It runs until 204800 bytes have been processed

  • It runs in the default throughput mode, with throughput figures displayed at the end of the test run

Command Line

Copy
Copied!
            

doca_bench  --core-mask 0xf00 \             --core-count 1 \             --warm-up-jobs 32 \             --device 17:00.0 \             --data-provider file \             --input-cwd ../. \             --data-provider-input-file plaintext_2k.txt \             --run-limit-bytes 204800 \             --pipeline-steps doca_aes_gcm::encrypt,doca_aes_gcm::decrypt \             --attribute doca_aes_gcm.key-file='aes128.key' \             --uniform-job-size 2048 \             --job-output-buffer-size 4096


Results Output

Copy
Copied!
            

Executing... Worker thread[0](core: 8) [doca_aes_gcm::encrypt>>doca_aes_gcm::decrypt] started... Worker thread[0] Executing 32 warm-up tasks using 32 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      79 micro seconds         Enqueued jobs: 214         Dequeued jobs: 214         Throughput:    002.701 MOperations/s         Ingress rate:  041.214 Gib/s         Egress rate:   041.214 Gib/s


Results Overview

Since a single core is specified, there is a single section of statistics output displayed.

  • This test invokes DOCA Bench on the host side to execute the SHA operation using the SHA256 algorithm and to create a CSV file containing the test configuration and statistics

  • A list of 1 core is provided with a count of 2 threads per core

Command Line

Copy
Copied!
            

doca_bench  --core-mask 2 \             --threads-per-core 2 \             --pipeline-steps doca_sha \             --device d8:00.0 \             --data-provider random-data \             --uniform-job-size 2048 \             --job-output-buffer-size 2048 \             --run-limit-seconds 3 \             --attribute doca_sha.algorithm=sha256 \             --warm-up-jobs 100 \             --csv-output-file /tmp/sha_256_test.csv


Results Output

Copy
Copied!
            

Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Data path thread [1] started... WT[1] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Stats for thread[0](core: 1)         Duration:      3000064 micro seconds         Enqueued jobs: 3713935         Dequeued jobs: 3713935         Throughput:    001.238 MOperations/s         Ingress rate:  018.890 Gib/s         Egress rate:   000.295 Gib/s Stats for thread[1](core: 1)         Duration:      3000056 micro seconds         Enqueued jobs: 3757335         Dequeued jobs: 3757335         Throughput:    001.252 MOperations/s         Ingress rate:  019.110 Gib/s         Egress rate:   000.299 Gib/s Aggregate stats         Duration:      3000064 micro seconds         Enqueued jobs: 7471270         Dequeued jobs: 7471270         Throughput:    002.490 MOperations/s         Ingress rate:  038.000 Gib/s         Egress rate:   000.594 Gib/s


Results Overview

As a single core has been specified with a thread count of 2, there are statistics displayed for each thread as well as the aggregate statistics.

It can also be observed that 2 threads are started on core 1 with each thread executing the warm-up jobs.

The contents of the /tmp/sha_256_test.csv are shown below. It can be seen that the configuration used for the test and the associated statistics from the test run are listed:

Copy
Copied!
            

cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.r    un_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attrib    ute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,c    fg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.    receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,sta    ts.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate ,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[1],throughput,0,,,,,,,,sha256,2048,1,2,1024,128,1 fragments,,,,,,,7471270,7471270,15301160960,239109312,038.000 Gib/s,000.594 Gib/s,2.490370 MOperations/s,2.490370 MOpera    tions/s


  • This test invokes DOCA Bench on the Host side to execute the SHA operation using the SHA512 algorithm and to create a csv file containing the test configuration and statistics,

  • The command is repeated with the added option of csv-append-mode. This instructs DOCA Bench to append the test run statistics to the existing csv file.

  • A list of 1 core is provided with a count of 2 threads per core.

Command Line

  1. Create the initial /tmp/sha_512_test.csv file:

    Copy
    Copied!
                

    doca_bench    --core-list 2 \                 --threads-per-core 2 \                 --pipeline-steps doca_sha \                 --device d8:00.0 \                 --data-provider random-data \                 --uniform-job-size 2048 \                 --job-output-buffer-size 2048 \                 --run-limit-seconds 3 \                 --attribute doca_sha.algorithm=sha512 \                 --warm-up-jobs 100 \                 --csv-output-file /tmp/sha_512_test.csv

  2. The second command is:

    Copy
    Copied!
                

    ./doca_bench    --core-list 2 \                 --threads-per-core 2 \                 --pipeline-steps doca_sha \                 --device d8:00.0 \                 --data-provider random-data \                 --uniform-job-size 2048 \                 --job-output-buffer-size 2048 \                 --run-limit-seconds 3 \                 --attribute doca_sha.algorithm=sha512 \                 --warm-up-jobs 100 \                 --csv-output-file /tmp/sha_512_test.csv \                 --csv-append-mode

    This causes DOCA Bench to append the configuration and statistics from the second command run to the /tmp/sha_512_test.csv file.

Results Output

This is a snapshot of the results output from the first command run:

Copy
Copied!
            

Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Data path thread [1] started... WT[1] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Stats for thread[0](core: 2)         Duration:      3015185 micro seconds         Enqueued jobs: 3590717         Dequeued jobs: 3590717         Throughput:    001.191 MOperations/s         Ingress rate:  018.171 Gib/s         Egress rate:   000.568 Gib/s Stats for thread[1](core: 2)         Duration:      3000203 micro seconds         Enqueued jobs: 3656044         Dequeued jobs: 3656044         Throughput:    001.219 MOperations/s         Ingress rate:  018.594 Gib/s         Egress rate:   000.581 Gib/s Aggregate stats         Duration:      3015185 micro seconds         Enqueued jobs: 7246761         Dequeued jobs: 7246761         Throughput:    002.403 MOperations/s         Ingress rate:  036.673 Gib/s         Egress rate:   001.146 Gib/s

This is a snapshot of the results output from the second command run:

Copy
Copied!
            

Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Data path thread [1] started... WT[1] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Stats for thread[0](core: 2)         Duration:      3000072 micro seconds         Enqueued jobs: 3602562         Dequeued jobs: 3602562         Throughput:    001.201 MOperations/s         Ingress rate:  018.323 Gib/s         Egress rate:   000.573 Gib/s Stats for thread[1](core: 2)         Duration:      3000062 micro seconds         Enqueued jobs: 3659148         Dequeued jobs: 3659148         Throughput:    001.220 MOperations/s         Ingress rate:  018.611 Gib/s         Egress rate:   000.582 Gib/s Aggregate stats         Duration:      3000072 micro seconds         Enqueued jobs: 7261710         Dequeued jobs: 7261710         Throughput:    002.421 MOperations/s         Ingress rate:  036.934 Gib/s         Egress rate:   001.154 Gib/s


Results Overview

Since a single core has been specified with a thread count of 2, there are statistics displayed for each thread as well as the aggregate statistics.

It can also be observed that 2 threads are started on core 1 with each thread executing the warm-up jobs.

The contents of the /tmp/sha_256_test.csv, after the first command has been run, are shown below. It can be seen that the configuration used for the test and the associated statistics from the test run are listed:

Copy
Copied!
            

cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate ,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7246761,7246761,14841366528,463850048,036.673 Gib/s,001.146 Gib/s,2.403422 MOperations/s,2.403422 MOperations/s

The contents of the /tmp/sha_256_test.csv, after the second command has been run, are shown below. It can be seen that a second entry has been added detailing the configuration used for the test and the associated statistics from the test run:

Copy
Copied!
            

cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate ,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7246761,7246761,14841366528,463850048,036.673 Gib/s,001.146 Gib/s,2.403422 MOperations/s,2.403422 MOperations/s ,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7261710,7261710,14871982080,464806784,036.934 Gib/s,001.154 Gib/s,2.420512 MOperations/s,2.420512 MOperations/s


  • This test invokes DOCA Bench on the BlueField side to execute the SHA operation using the SHA1 algorithm and to display statistics every 2000 milliseconds during the test run

  • A list of 3 cores is provided with a count of 2 threads per core and a core-count of 1

  • The core-count instructs DOCA Bench to use the first core number in the core list, in this case core number 2

Command Line

Copy
Copied!
            

doca_bench    --core-list 2,3,4 \                 --core-count 1 \                 --threads-per-core 2 \                 --pipeline-steps doca_sha \                 --device 03:00.0 \                 --data-provider random-data \                 --uniform-job-size 2048 \                 --job-output-buffer-size 2048 \                 --run-limit-seconds 3 \                 -attribute doca_sha.algorithm=sha1 \                 --warm-up-jobs 100 \                 --rt-stats-interval 2000


Results Output

Copy
Copied!
            

Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Data path thread [1] started... WT[1] Executing 100 warm-up tasks using 100 unique tasks Stats for thread[0](core: 2)         Duration:      965645 micro seconds         Enqueued jobs: 1171228         Dequeued jobs: 1171228         Throughput:    001.213 MOperations/s         Ingress rate:  018.505 Gib/s         Egress rate:   000.181 Gib/s Stats for thread[1](core: 2)         Duration:      965645 micro seconds         Enqueued jobs: 1171754         Dequeued jobs: 1171754         Throughput:    001.213 MOperations/s         Ingress rate:  018.514 Gib/s         Egress rate:   000.181 Gib/s Aggregate stats         Duration:      965645 micro seconds         Enqueued jobs: 2342982         Dequeued jobs: 2342982         Throughput:    002.426 MOperations/s         Ingress rate:  037.019 Gib/s         Egress rate:   000.362 Gib/s Stats for thread[0](core: 2)         Duration:      2968088 micro seconds         Enqueued jobs: 3653691         Dequeued jobs: 3653691         Throughput:    001.231 MOperations/s         Ingress rate:  018.783 Gib/s         Egress rate:   000.183 Gib/s Stats for thread[1](core: 2)         Duration:      2968088 micro seconds         Enqueued jobs: 3689198         Dequeued jobs: 3689198         Throughput:    001.243 MOperations/s         Ingress rate:  018.965 Gib/s         Egress rate:   000.185 Gib/s Aggregate stats         Duration:      2968088 micro seconds         Enqueued jobs: 7342889         Dequeued jobs: 7342889         Throughput:    002.474 MOperations/s         Ingress rate:  037.748 Gib/s         Egress rate:   000.369 Gib/s Cleanup... [main] Completed! tearing down... Stats for thread[0](core: 2)         Duration:      3000122 micro seconds         Enqueued jobs: 3694128         Dequeued jobs: 3694128         Throughput:    001.231 MOperations/s         Ingress rate:  018.789 Gib/s         Egress rate:   000.184 Gib/s Stats for thread[1](core: 2)         Duration:      3000089 micro seconds         Enqueued jobs: 3751128         Dequeued jobs: 3751128         Throughput:    001.250 MOperations/s         Ingress rate:  019.079 Gib/s         Egress rate:   000.186 Gib/s Aggregate stats         Duration:      3000122 micro seconds         Enqueued jobs: 7445256         Dequeued jobs: 7445256         Throughput:    002.482 MOperations/s         Ingress rate:  037.867 Gib/s         Egress rate:   000.370 Gib/s


Results Overview

Although a core list of 3 cores has been specified, the core-count value of 1 instructs DOCA Bench to use the first entry in the core list.

It can be seen that as a thread-count of 2 has been specified, there are 2 threads created on core 2.

A transient statistics interval of 2000 milliseconds has been specified, and the transient statistics per thread can be seen, as well as the final aggregate statistics.

  • This test invokes DOCA Bench to execute a local DMA operation on the host

  • It specifies that a core sweep should be carried out using core counts of 1, 2, and 4 using the option --sweep core-count,1,4,*2

  • Test output is to be saved in a CSV file /tmp/dma_sweep.csv and a filter is applied so that only statistics information is recorded. No configuration information is to be recorded.

Command Line

Copy
Copied!
            

doca_bench    --core-mask 0xff \                 --sweep core-count,1,4,*2 \                 --pipeline-steps doca_dma \                 --device d8:00.0 \                 --data-provider random-data \                 --uniform-job-size 2048 \                 --job-output-buffer-size 2048 \                 --run-limit-seconds 5 \                 --csv-output-file /tmp/dma_sweep.csv \                 --csv-stats "stats.*"


Results Overview

Copy
Copied!
            

Test permutations: [         Attributes: []         Uniform job size: 2048         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured --         --------------------------------         Attributes: []         Uniform job size: 2048         Core count: 2         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured --         --------------------------------         Attributes: []         Uniform job size: 2048         Core count: 4         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured -- ]   [main] Initialize framework... [main] Start execution... Preparing permutation 1 of 3... Executing permutation 1 of 3... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup permutation 1 of 3... Aggregate stats         Duration:      5000191 micro seconds         Enqueued jobs: 22999128         Dequeued jobs: 22999128         Throughput:    004.600 MOperations/s         Ingress rate:  070.185 Gib/s         Egress rate:   070.185 Gib/s Preparing permutation 2 of 3... Executing permutation 2 of 3... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Data path thread [1] started... WT[1] Executing 100 warm-up tasks using 100 unique tasks Cleanup permutation 2 of 3... Stats for thread[0](core: 0)         Duration:      5000066 micro seconds         Enqueued jobs: 14409794         Dequeued jobs: 14409794         Throughput:    002.882 MOperations/s         Ingress rate:  043.975 Gib/s         Egress rate:   043.975 Gib/s Stats for thread[1](core: 1)         Duration:      5000188 micro seconds         Enqueued jobs: 14404708         Dequeued jobs: 14404708         Throughput:    002.881 MOperations/s         Ingress rate:  043.958 Gib/s         Egress rate:   043.958 Gib/s Aggregate stats         Duration:      5000188 micro seconds         Enqueued jobs: 28814502         Dequeued jobs: 28814502         Throughput:    005.763 MOperations/s         Ingress rate:  087.932 Gib/s         Egress rate:   087.932 Gib/s Preparing permutation 3 of 3... Executing permutation 3 of 3... Data path thread [1] started... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks WT[1] Executing 100 warm-up tasks using 100 unique tasks Data path thread [3] started... WT[3] Executing 100 warm-up tasks using 100 unique tasks Data path thread [2] started... WT[2] Executing 100 warm-up tasks using 100 unique tasks Cleanup permutation 3 of 3... [main] Completed! tearing down... Stats for thread[0](core: 0)         Duration:      5000092 micro seconds         Enqueued jobs: 7227025         Dequeued jobs: 7227025         Throughput:    001.445 MOperations/s         Ingress rate:  022.055 Gib/s         Egress rate:   022.055 Gib/s Stats for thread[1](core: 1)         Duration:      5000081 micro seconds         Enqueued jobs: 7223269         Dequeued jobs: 7223269         Throughput:    001.445 MOperations/s         Ingress rate:  022.043 Gib/s         Egress rate:   022.043 Gib/s Stats for thread[2](core: 2)         Duration:      5000047 micro seconds         Enqueued jobs: 7229678         Dequeued jobs: 7229678         Throughput:    001.446 MOperations/s         Ingress rate:  022.063 Gib/s         Egress rate:   022.063 Gib/s Stats for thread[3](core: 3)         Duration:      5000056 micro seconds         Enqueued jobs: 7223037         Dequeued jobs: 7223037         Throughput:    001.445 MOperations/s         Ingress rate:  022.043 Gib/s         Egress rate:   022.043 Gib/s Aggregate stats         Duration:      5000092 micro seconds         Enqueued jobs: 28903009         Dequeued jobs: 28903009         Throughput:    005.780 MOperations/s         Ingress rate:  088.203 Gib/s         Egress rate:   088.203 Gib/s


Results Overview

The output gives a summary of the permutations being carried out and then proceeds to display the statistics for each of the permutations.

The CSV output file contents can be seen to contain only statistics information. Configuration information is not included.

There is an entry for each of the sweep permutations:

Copy
Copied!
            

stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate 22999128,22999128,47102214144,47102214144,070.185 Gib/s,070.185 Gib/s,4.599650 MOperations/s,4.599650 MOperations/s 28814502,28814502,59012100096,59012100096,087.932 Gib/s,087.932 Gib/s,5.762683 MOperations/s,5.762683 MOperations/s 28903009,28903009,59193362432,59193362432,088.203 Gib/s,088.203 Gib/s,5.780495 MOperations/s,5.780495 MOperations/s


This test invokes DOCA Bench to execute a local DMA operation on the host.

It specifies that a uniform job size sweep should be carried out using job sizes 1024 and 2048 using the option --sweep uniform-job-size,1024,2048.

Test output is to be saved in a CSV file /tmp/dma_sweep_job_size.csv and collection of environment information is enabled.

Command Line

Copy
Copied!
            

doca_bench    --core-mask 0xff \                 --core-count 1 \                 --pipeline-steps doca_dma \                 --device d8:00.0 \                 --data-provider random-data \                 --sweep uniform-job-size,1024,2048 \                 --job-output-buffer-size 2048 \                 --run-limit-seconds 5 \                 --csv-output-file /tmp/dma_sweep_job_size.csv \                 --enable-environment-information


Results Overview

Copy
Copied!
            

Test permutations: [         Attributes: []         Uniform job size: 1024         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured --         --------------------------------         Attributes: []         Uniform job size: 2048         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: -- not configured --         Input data file: -- not configured -- ]   [main] Initialize framework... [main] Start execution... Preparing permutation 1 of 2... Executing permutation 1 of 2... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup permutation 1 of 2... Aggregate stats         Duration:      5000083 micro seconds         Enqueued jobs: 23645128         Dequeued jobs: 23645128         Throughput:    004.729 MOperations/s         Ingress rate:  036.079 Gib/s         Egress rate:   036.079 Gib/s Preparing permutation 2 of 2... Executing permutation 2 of 2... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup permutation 2 of 2... [main] Completed! tearing down... Aggregate stats         Duration:      5000027 micro seconds         Enqueued jobs: 22963128         Dequeued jobs: 22963128         Throughput:    004.593 MOperations/s         Ingress rate:  070.078 Gib/s         Egress rate:   070.078 Gib/s


Results Overview

The output gives a summary of the permutations being carried out and then proceeds to display the statistics for each of the permutations.

The CSV output file contents can be seen to contain statistics information and the environment information.

There is an entry for each of the sweep permutations.

Copy
Copied!
            

cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate,host.pci.3.address,host.pci.3.ext_tag,host.pci.3.link_type,host.pci.2.ext_tag,host.pci.2.address,host.cpu.0.model,host.ofed_version,host.pci.4.max_read_request,host.pci.2.width,host.cpu.1.logical_cores,host.pci.2.eswitch_mode,host.pci.3.max_read_request,host.pci.4.address,host.pci.2.link_type,host.pci.1.max_read_request,host.pci.4.link_type,host.cpu.socket_count,host.pci.0.ext_tag,host.pci.6.port_speed,host.cpu.0.physical_cores,host.pci.7.port_speed,host.memory.dimm_slot_count,host.cpu.1.model,host.pci.0.max_payload_size,host.pci.6.relaxed_ordering,host.doca_host_package_version,host.pci.6.max_payload_size,host.pci.0.gen,host.pci.4.width,host.pci.2.gen,host.pci.1.max_payload_size,host.pci.4.relaxed_ordering,host.pci.3.width,host.cpu.0.logical_cores,host.cpu.0.arch,host.pci.4.port_speed,host.pci.4.eswitch_mode,host.pci.7.address,host.pci.5.eswitch_mode,host.pci.5.address,host.cpu.1.arch,host.pci.0.eswitch_mode,host.pci.7.width,host.pci.7.link_type,host.pci.1.link_type,host.pci.3.gen,host.pci.7.max_read_request,host.pci.7.eswitch_mode,host.pci.6.gen,host.pci.2.port_speed,host.pci.7.gen,host.pci.2.relaxed_ordering,host.pci.6.width,host.pci.4.gen,host.pci.6.address,host.hostname,host.pci.5.link_type,host.pci.6.link_type,host.pci.6.max_read_request,host.pci.7.max_payload_size,host.pci.5.gen,host.pci.6.eswitch_mode,host.pci.5.width,host.pci.3.relaxed_ordering,host.pci.4.ext_tag,host.pci.0.width,host.pci.5.port_speed,host.pci.2.max_payload_size,host.pci.3.max_payload_size,host.pci.5.max_payload_size,host.pci.2.max_read_request,host.pci.0.address,host.pci.gen,host.os.family,host.pci.1.gen,host.pci.5.relaxed_ordering,host.pci.1.port_speed,host.pci.7.ext_tag,host.pci.1.address,host.pci.3.eswitch_mode,host.pci.3.port_speed,host.pci.0.max_read_request,host.pci.1.ext_tag,host.pci.0.relaxed_ordering,host.pci.0.link_type,host.pci.5.max_read_request,host.pci.4.max_payload_size,host.pci.device_count,host.memory.populated_dimm_count,host.memory.installed_capacity,host.pci.6.ext_tag,host.os.kernel_version,host.pci.0.port_speed,host.pci.1.width,host.pci.7.relaxed_ordering,host.pci.1.relaxed_ordering,host.os.version,host.os.name,host.cpu.1.physical_cores,host.numa_node_count,host.pci.5.ext_tag,host.pci.1.eswitch_mode ,[doca_dma],0,0,10000,1000,5,,,random-data,2048,d8:00.0,,,100,"[0, 1, 2, 3, 4, 5, 6, 7]",throughput,0,,,,,,,,,1024,1,1,1024,128,1 fragments,,,,,,,23645128,23645128,24212611072,24212611072,036.079 Gib/s,036.079 Gib/s,4.728947 MOperations/s,4.728947 MOperations/s,0000:5e:00.1,true,Infiniband,true,0000:5e:00.0,N/A,OFED-internal-24.04-0.4.8,N/A,x63,N/A,N/A,N/A,0000:af:00.0,Infiniband,N/A,Ethernet,2,true,N/A,N/A,N/A,N/A,N/A,N/A,true,<none>,N/A,Gen15,x63,Gen15,N/A,true,x63,N/A,x86_64,104857600000,N/A,0000:d8:00.1,N/A,0000:af:00.1,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true,x63,Gen15,0000:d8:00.0,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true,true,x63,104857600000,N/A,N/A,N/A,N/A,0000:3b:00.0,N/A,Linux,Gen15,true,N/A,true,0000:3b:00.1,N/A,N/A,N/A,true,true,Infiniband,N/A,N/A,8,N/A,270049112064,true,5.4.0-174-generic,N/A,x63,true,true,20.04.1 LTS (Focal Fossa),Ubuntu,N/A,2,true,N/A ,[doca_dma],0,0,10000,1000,5,,,random-data,2048,d8:00.0,,,100,"[0, 1, 2, 3, 4, 5, 6, 7]",throughput,0,,,,,,,,,2048,1,1,1024,128,1 fragments,,,,,,,22963128,22963128,47028486144,47028486144,070.078 Gib/s,070.078 Gib/s,4.592600 MOperations/s,4.592600 MOperations/s,0000:5e:00.1,true,Infiniband,true,0000:5e:00.0,N/A,OFED-internal-24.04-0.4.8,N/A,x63,N/A,N/A,N/A,0000:af:00.0,Infiniband,N/A,Ethernet,2,true,N/A,N/A,N/A,N/A,N/A,N/A,true,<none>,N/A,Gen15,x63,Gen15,N/A,true,x63,N/A,x86_64,104857600000,N/A,0000:d8:00.1,N/A,0000:af:00.1,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true,x63,Gen15,0000:d8:00.0,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true,true,x63,104857600000,N/A,N/A,N/A,N/A,0000:3b:00.0,N/A,Linux,Gen15,true,N/A,true,0000:3b:00.1,N/A,N/A,N/A,true,true,Infiniband,N/A,N/A,8,N/A,270049112064,true,5.4.0-174-generic,N/A,x63,true,true,20.04.1 LTS (Focal Fossa),Ubuntu,N/A,2,true,N/A


  • This test invokes DOCA Bench to execute a remote DMA operation on the host

  • It specifies the companion connection details to be used on the host and that remote output buffers are to be used

Command Line

Copy
Copied!
            

doca_bench  --core-list 12 \             --pipeline-steps doca_dma \             --device 03:00.0 \             --data-provider random-data \             --uniform-job-size 2048 \             --job-output-buffer-size 2048 \             --use-remote-output-buffers \             --companion-connection-string proto=tcp,port=12345,mode=host,dev=17:00.0,user=bob,addr=10.10.10.10 \             --run-limit-seconds 5


Results Overview

Copy
Copied!
            

Executing... Worker thread[0](core: 12) [doca_dma] started... Worker thread[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      5000073 micro seconds         Enqueued jobs: 32202128         Dequeued jobs: 32202128         Throughput:    006.440 MOperations/s         Ingress rate:  098.272 Gib/s         Egress rate:   098.272 Gib/s


Results Overview

None.

Note

This test is relevant for BlueField-2 only.

  • This test invokes DOCA Bench to run compression using random data as input

  • The compression algorithm specified is “deflate”

Command Line

Copy
Copied!
            

doca_bench    --core-list 2 \                 --pipeline-steps doca_compress::compress \                 --device 03:00.0 \                 --data-provider random-data \                 --uniform-job-size 2048 \                 --job-output-buffer-size 4096 \                 --run-limit-seconds 3 \                 --attribute doca_compress.algorithm="deflate"


Result Output

Copy
Copied!
            

Executing... Data path thread [0] started... WT[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      3000146 micro seconds         Enqueued jobs: 5340128         Dequeued jobs: 5340128         Throughput:    001.780 MOperations/s         Ingress rate:  027.160 Gib/s         Egress rate:   027.748 Gib/s


Results Overview

None

  • This test invokes DOCA Bench to run decompression using random data as input

  • This test specifies a data provider of file set which contains the filename of an LZ4 compressed file

  • Remote input buffers are specified to be used for the input jobs

  • It specifies the companion connection details to be used on the host for the remote input buffers

Command Line

Copy
Copied!
            

doca_bench --core-list 12 \             --pipeline-steps doca_compress::decompress \             --device 03:00.0 \             --data-provider file-set \             --data-provider-input-file lz4_compressed_64b_buffers.fs \             --job-output-buffer-size 4096 \             --run-limit-seconds 3 \             --attribute doca_compress.algorithm="lz4" \             --use-remote-output-buffers \             --companion-connection-string proto=tcp,port=12345,mode=host,dev=17:00.0,user=bob,addr=10.10.10.10


Results Output

Copy
Copied!
            

Executing... Worker thread[0](core: 12) [doca_compress::decompress] started... Worker thread[0] Executing 100 warm-up tasks using 100 unique tasks Cleanup... [main] Completed! tearing down... Aggregate stats         Duration:      3000043 micro seconds         Enqueued jobs: 15306128         Dequeued jobs: 15306128         Throughput:    005.102 MOperations/s         Ingress rate:  003.155 Gib/s         Egress rate:   002.433 Gib/s


Results Comment

None

  • This test invokes DOCA Bench to run the EC creation step.

  • It runs in bulk latency mode and specifies the doca_ec attributes of data_block_count, redundancy_block_count, and matrix_type

Command Line

Copy
Copied!
            

doca_bench --mode bulk-latency \             --core-list 12 \             --pipeline-steps doca_ec::create \             --device 17:00.0 \             --data-provider random-data \             --uniform-job-size 1024 \             --job-output-buffer-size 1024 \             --run-limit-seconds 3 \             --attribute doca_ec.data_block_count=16 \             --attribute doca_ec.redundancy_block_count=16 \             --attribute doca_ec.matrix_type=cauchy


Results Output

Bulk latency output will be similar to that presented in section “BlueField-side Decompress LZ4 Sample”.

Results Comment

Bulk latency output will be similar to that presented earlier on this page.

  • This test invokes DOCA Bench to run the EC creation step

  • It runs in precision latency mode and specifies the doca_ec attributes of data_block_count, redundancy_block_count, and matrix_type

Command Line

Copy
Copied!
            

doca_bench    --mode precision-latency \                 --core-list 12 \                 --pipeline-steps doca_ec::create \                 --device 03:00.0 \                 --data-provider random-data \                 --uniform-job-size 1024 \                 --job-output-buffer-size 1024 \                 --run-limit-jobs 5000 \                 --attribute doca_ec.data_block_count=16 \                 --attribute doca_ec.redundancy_block_count=16 \                 --attribute doca_ec.matrix_type=cauchy


Results Output

None

Results Comment

Precision latency output will be similar to that presented earlier on this page.

  • This test invokes DOCA Bench in Comch consumer mode using a core-list on host side and BlueField side

  • The run-limit is 500 jobs

Command Line

Copy
Copied!
            

./doca_bench --core-list 4 --warm-up-jobs 32 --pipeline-steps doca_comch::consumer --device ca:00.0 --data-provider random-data --run-limit-jobs 500 --core-count 1 --uniform-job-size 4096 --job-output-buffer-size 4096 --companion-connection-string proto=tcp,mode=dpu,dev=03:00.0,user=bob,addr=10.10.10.10,port=12345 --attribute dopt.companion_app.path=<path to DPU doca_bench_companion application location> --data-provider-job-count 256 --companion-core-list 12


Results Output

Copy
Copied!
            

[main] Completed! tearing down... Aggregate stats         Duration:      1415 micro seconds         Enqueued jobs: 500         Dequeued jobs: 500         Throughput:    000.353 MOperations/s         Ingress rate:  000.000 Gib/s         Egress rate:   010.782 Gib/s


Results Comment

The aggregate statistics show the test completed after 500 jobs were processed.

  • This test invokes DOCA Bench in Comch producer mode using a core-mask on the host side and BlueField side

  • The run-limit is 1000 jobs

Command Line

Copy
Copied!
            

doca_bench    --core-list 4 \                 --warm-up-jobs 32 \                 --pipeline-steps doca_comch::producer \                 --device ca:00.0 \                 --data-provider random-data \                 --run-limit-jobs 500 \                 --core-count 1 \                 --uniform-job-size 4096 \                 --job-output-buffer-size 4096 \                 --companion-connection-string proto=tcp,mode=dpu,dev=03:00.0,user=bob,addr=10.10.10.10,port=12345 \                 --attribute dopt.companion_app.path=<path to DPU doca_bench_companion location> \                 --data-provider-job-count 256 \                 --companion-core-list 12


Results Overview

Copy
Copied!
            

[main] Completed! tearing down... Aggregate stats         Duration:      407 micro seconds         Enqueued jobs: 500         Dequeued jobs: 500         Throughput:    001.226 MOperations/s         Ingress rate:  037.402 Gib/s         Egress rate:   000.000 Gib/s


Results Comment

The aggregate statistics show the test completed after 500 jobs were processed.

  • This test invokes DOCA Bench in RDMA send mode using a core-list on the send and receive side

  • The send queue size is configured to 50 entries

Command Line

Copy
Copied!
            

doca_bench  --pipeline-steps doca_rdma::send \             --device d8:00.0 \             --data-provider random-data \             --uniform-job-size 2048 \             --job-output-buffer-size 2048 \             --run-limit-seconds 3 \             --send-queue-size 50 \             --companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ca:00.0 \             --companion-core-list 12 \             --core-list 12


Results Output

Copy
Copied!
            

Test permutations: [         Attributes: []         Uniform job size: 2048         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: 50         RQ depth: -- not configured --         Input data file: -- not configured -- ]


Results Comment

The configuration output shows the send queue size configured to 50.

  • This test invokes DOCA Bench in RDMA receive mode using a core-list on the send and receive side

  • The receive queue size is configured to 100 entries

Command Line

Copy
Copied!
            

doca_bench  --pipeline-steps doca_rdma::receive \             --device d8:00.0 \             --data-provider random-data \             --uniform-job-size 2048 \             --job-output-buffer-size 2048 \             --run-limit-seconds 3 \             --receive-queue-size 100 \             --companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ca:00.0 \             --companion-core-list 12 \             --core-list 12


Results Output

Copy
Copied!
            

Test permutations: [         Attributes: []         Uniform job size: 2048         Core count: 1         Per core thread count: 1         Task pool size: 1024         Data provider job count: 128         MTU size: -- not configured --         SQ depth: -- not configured --         RQ depth: 100         Input data file: -- not configured -- ]


Results Overview

The configuration output shows the receive queue size configured to 100.

© Copyright 2024, NVIDIA. Last updated on May 7, 2024.