RXPBench is a performance comparison tool for NVIDIA® BlueField® RXP.
RXPBench is a tool that allows for the performance comparison between the NVIDIA® RXP® hardware RegEx acceleration engine found in the NVIDIA® BlueField® DPU and the Intel® Hyperscan software library. It provides a comprehensive set of options and can facilitate ingress of data from live network ports or previously recorded PCAP files.
This document provides the following information for RXPBench:
- Example use case
- Breakdown of analysis and runtime statistics
- Options and configuration settings available
The terms listed in the following table are used in this document.
|Job||A unit of data for the RXP to scan. A job can be a packet, packet header, packet payload, packet header and payload, or a block of user-defined data.|
|Job Directory||A directory with custom files that contain data to test that matches returned are as expected (validation)|
|RegEx||A common abbreviation for regular expression.|
|Regular expression||A regular expression is a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A common abbreviation for this is "Regex".|
|ROF file||The compiled Regex rules as object code, produced by the RXP compiler, and programmed into the RXP engine.|
|Ruleset||A list of regular expressions and strings that can be compiled into object code by the RXP Compiler and executed on the RXP.|
|RXP||High-speed, hardware-accelerated regular expression engine|
|RXPC||The external compiler application that translates regular expressions into compiled object code (ROF file)|
The acronyms listed in the following table are used in this document.
|HS||Intel® Hyperscan Software Library|
|PCRE||Perl Compatible Regular Expressions|
|ROF||RXP Object Format (currently at version 2)|
|RXP||Regular eXpression Processor|
|RXPC||Regular eXpression Processor Compiler|
The following icons are used within this document:
Whilst the primary focus of this tool is to provide accurate real-world performance comparisons between the Intel® Hyperscan software library (HS) and the BlueField-2 RXP hardware acceleration engine has additional functionality. This functionality includes:
- Execution on both Intel host and the BlueField-2 DPU Arm cores
- Multicore support
- Ingress of traffic from live DPDK network ports, or PCAP files
- Can act as a "bump in the wire"
- Ability to accept RXP, Hyperscan, and generic rules files
- Asynchronous operations, similar to end-user applications
- Comprehensive configuration through a configuration file or command line options
- A high-performance reference application for DPDK RegEx operations
RXPBench utilizes either the high-speed DOCA (doca_regex) framework, or DPDK (dpdk_regex) to provide hardware accelerated regular expression offloading using the RXP engine. Software-based RegEx evaluation is provided through the standard Hyperscan library.
The following is an overview of the RXPBench architecture:
This diagram shows the relationship between RXPBench and the underlying BlueField-2 hardware when the application is being run on the x86 host (left side) or on the BlueField-2 Arm cores (right).
At the core of the application is a packet processing engine designed to acquire packets from a network or file source. These packets are then processed through DPDK threads and offloaded to the RXP hardware accelerator using the high performance DOCA or DPDK libraries for pattern matching (or via the Hyperscan library).
RXPBench is installed as part of the standard DOCA installation process via the
Follow the instructions in the NVIDIA DOCA Installation Guide for Linux for instructions on how to install DOCA if you have not done so yet.
|Linux Distribution||Hyperscan Version||Installation Command|
|CentOS 8.x||–||Hyperscan is provided through 3rd party vendors. The following command will install Hyperscan 5.3.0 on CentOS 8:
Hyperscan is provided through 3rd party vendors. The following command will install Hyperscan 5.3.0 on CentOS 7:
The RXPBench utility is provided as part of the DOCA framework and is therefore installed by default with the BFB.
RXPBench provides support for both the DOCA framework and the DPDK.
While most functionalities are supported in both frameworks, DOCA provides additional features to further enhance RegEx pattern matching. The icon
is used to indicate a DOCA only feature.
For information on selecting between DOCA and DPDK, refer to the
This example will focus on the execution of RXPBench using a simple text file containing the works of William Shakespeare (shakespeare.txt), using rules in Hyperscan format (henry.hs) whilst executing on the BlueField-2 RXP engine.
RXPBench supports the configuration of options through a "configuration" file, or through the command line. In practice if a configure file is used, the command line options are still available and will override any options already present in the configuration file.
By default, RXPBench will always search for a "rxpbench.conf", this allows common set-up commands to be removed from the command line. Commonly the DPDK EAL (-D) options are placed in this file as they rarely change after being initially set.
For a full list of options, see section General Configuration Options.
In this example we are providing all the options on the command-line; there is no configuration file. The command-line required to execute our example (simple text file containing the works of William Shakespeare (shakespeare.txt), using rules in Hyperscan format (henry.hs) whilst executing on the BlueField-2 RXP engine) is as follows:
./rxpbench -D "-l 0,1,2,3 -n 1 -a 5e:00.0,class=regex –file-prefix=rxpbench -a 5e:00.01" --input-mode text_file -f ../Shakespeare.txt -d rxp -R ../henry.hs -l 2048 -n 10000 -c 1
The "-D" option provides the DPDK EAL options, contained within a set of quotation marks ("). These options are passed directly to DPDK during the initialization of the application and are in general specific to your host. "
The first RXPBench option is the "--input-mode" which states that RXPBench will pull data from a "text_file", the "-f" option then specifies the location and name of the text file to be searched.
The "-d" option states the mode in which RXPBench will operate, available options are "rxp" or "hs" and in this instance we are requesting that the BlueField-2 hardware accelerator is used.
The "-R" option provides the tool with a set of uncompiled rules, in this case they are presented in Hyperscan format. RXPBench supports the use of rules formats that are different from the selected device/algorithm. For example, the RXP can accept Hyperscan rules and the Hyperscan library can received RXP formatted rules. The conversion process within RXPBench is automatic.
The "-l" option supplies the size of the data block sent to the device/algorithm. In this instance a buffer of 2KB is received and pattern matched from the text file.
The number of iterations is controlled by the "-n" option; due to the high performance of the BlueField-2 RXP engine the input file must be iterated 10,000 times to provide enough input data to ensure are run-time of a few seconds.
The final option is the core count (-c), this defines how many CPU cores the tool can use. In this instance we are using a single core.
RXPBench accept regular expressions in two different formats:
- Uncompiled – The regular expressions are presented in a text file which follows with the RXP rules file format, or the Hyperscan file format.
- Compiled – In the case for the RXP, rules are externally compiled using the RXP Compiler (rxpc) and presented to RXPBench as a ROF file.
If uncompiled rule files are used, RXPBench can cross compile the rules regardless of the file format or device selected, i.e. A Hyperscan format rules file will be converted for use by the RXP engine, whilst an RXP format rules file will be converted for use by the Hyperscan engine.
RXPBench presents information during its start-up that indicates the progress of compilation of uncompiled rules, as well as the success of programming those compiled rules to the device:
If any errors or warnings are detected during the compilation process, RXPBench provides detailed information on the problematic regular expression. For example:
During the execution of RXPBench a series of run-time statistics are presented by the utility. This provides detailed information on the current process:
For each core in use by the tool, the following statistic are presented:
- Received Bytes – These are bytes received from the input source
- Regex Bytes – These are the bytes transmitted to the Regex engine; this value can be less than the received byte count if certain confirmation options are used (such as payload thresholds or if “app layer” payloads are only being scanned)
- Recv Bufs – The total buffers received from the input source; in this case due to the “-l 2048” option each buffer contains 2048 bytes.
- Regex Bugfs – The number of buffers transmitted to the Regex device.
- Matches – The total number of Regex matches seen in the input data.
In addition to each core statistic, a running total output is provided, including aggregated values for the above fields and a duration field it also provides:
- Regex Perf (total) – The performance total in Gigabits per second (Gb/s) for the entire duration of the run.
- Regex Perf (split) – The performance total in Gigabits per second (Gb/s) for the past update period.
This section of statistics provides an overview of the RXPBench configuration, most of this information is simply the mirroring of configuration files.
The “Preloaded Data Info” section details any preloading of data, when using PCAP or text files, that has occurred during the initialization of the application:
- Data Length – When the input is file based (PCAP or Text) this is the total data that is preloaded/cached to reduce I/O operations
- App Layer Mode – Whether the application is effectively scanning the application layer (TCP/UDP frames) and ignore the headers (Ethernet, MAC, etc.) prior to the application layer.
- Valid Packets – If app layer mode is enabled, these are packets that contain a valid payload
- Invalid Length – This value is incremented if a PCAP packet is found to be unexpectedly truncated
- Unsupported Prot – If app layer mode is enabled, the packet did not contain one of the required protocols (VLAN/IPv4/IPv6/TCP or UDP)
This section provides an overview of the RXPBench execution; it provides the core statistics which allow you to gauge the performance of the algorithm using the supplied rules and input data.
While most of these fields are self-explanatory some fields require further definition:
- Packet Processing Rate (Mpps) – This is the rate which, in million packets per second, RXPBench has been able to acquire packets from the input source (Physical port or precached PCAP/text file). For the physical ports this rate may be different that the RegEx PPR value as not all packets (depending on configuration) may be sent to the Regex device.
- Packet Processing Perf (Gb/s) – The actual data-rate of the input source in Gigabits per second
- Total Regex Buffers – This is the number of complete buffers that were sent to the RegEx device for processing
- Total Regex Bytes – The total bytes contained within all buffers transmitted to the RegEx device for processing
- Total Regex Batches – RegEx buffers are gathered together into batches (based on the "-g" flag) and submitted to the RegEx device in a single operation
If the selected RegEx device is "rxp" or "regex_dpdk" the following block of statistics is provided. It presents more internal statistics from the DPDK RegEx device (BlueField-2 RXP):
The following are the definitions of each of these counters:
- Invalid Responses – These are responses from operations that have not completed successfully
- Timeout – When processing a block of input data a hardware triggered timeout occurred and the search was aborted
- Max Matches – The maximum number of configured matches was exceeded, and the job was aborted
- Max Prefixes – The maximum prefixes per scan was exceeded, and the job was aborted
- Resource Limit – A generic/internal resourcing limit was reached; the job was aborted
- Latency Figures – These provides max/min and average latency of jobs from transmitted to the DPDK RegEx device
It is important to note that in normal mode (i.e., not in latency mode) RXPBench ensures that the hardware is supplied with data that is designed to maximize throughput. As stated previously, latency figures in this mode are not calculated accurately. To view the correct hardware latency, make sure the
--latency-mode option is provided. The following screenshot shows the RegEx stats with latency mode enabled:
If the selected Regex device is “hs” (or “Hyperscan”) then an additional block of statistics is provided detailing the latency of requests to and response from the Hyperscan Library:
Configuration options to control the operation of RXPBench can be provided either through a pre-defined configuration file or through the command line. If both a configuration file is supplied and a set of command line options then the command line options will supersede, effectively overriding, the options present in the configuration file.
The configuration file option allows you to supply a text file that contains one or more options that would normally be present on the command line.
-C configuration.file --config-file configuration.file
The file should contain each configuration option stripped of the leading dashes on a new line. A colon (:) should be placed between the option and the value. You may use either the short (-) or long (--) option name. For example:
input-mode : dpdk_port m : inputfile.pcap run-time-secs : 10
If the “-C” or “--config-file” option is used without any supplied parameter, RXPBench will attempt to open the default file “rxpbench.conf”.
Providing any additional command line options after the
--config-file will override any present within the configuration file.
RXPBench utilizes the DPDK framework to provide core memory management, packet ingress and Regular expression offloading. As common with DPDK applications there are several EAL options that can be used to ensure DPDK is optimally configured for the host environment.
EAL options should be enclosed in quotations (“..”) and are passed directly to DPDK without any processing by RXPBench.
Please ensure if you are created a custom set of EAL commands that the “class=regex” parameter is included to ensure the Regex devices is available for use. You should use the “class=eth:regex” if you wish to use packet acquisition from physical ports and Regex.
The CPU cores selected for use through the EAL options will be the same cores used by the whole RXPBench application.
Care should be taken when selecting EAL options. Misconfiguration may affect the utilities ability to obtain maximum performance on the target hardware. A full list of EAL options is provided by DPDK.
This option provides additional verbose output on any matching patterns found by the Regex algorithm. The supplied integer value dictates the amount of information provided:
-V 1 -V 2 -V 3
All verbose levels will write out to a CSV files named "rxpbench_matches_main_core_XX.csv", where XX is the main logical core ID returned by DPDK, and "rxpbench_matches_core_XX.csv" for additional cores in a multicore environment.
Each entry in the CSV file provides match information including queue ID, rule ID, start offset and length. If the verbose level is set to
3 then the match string is also returned.
-V 2 and -V 3 will cause the writing of large amounts of data if a substantial number of matches are reported and it may result in characters that break the CSV format (such as commas, new lines, etc.) being placed in the output file. In extreme cases this may result in a performance reduction.
The "Cores" option allows for the configuration of the total number of cores available to RXPBench.
The use of the CPU cores is dependent on the application’s Regular Expression algorithm and whether packets are being received from an ingress port or PCAP capture file.
If the BlueField-2 RXP hardware accelerator is used each core will be given a unique DPDK Regex queue to operate on; if the accelerator is Hyperscan then each core will be used to execute the Hyperscan software library.
In addition, if packets are being received from a physical port, the value will be used to allocate X number of DPDK Tx and Rx queues on the port.
The value supplied here must be ≤ the number of cores provided in the -D (EAL) options. If an invalid value is supplied a warning will be produced and the EAL (-D) core count will be used.
This group of options provides the ability to select the Algorithm (BF2 RXP or Hyperscan), where input data should be received from (physical ports, text files, or PCAP files) and Regular Expression rules information.
This option allows you to select the underlying framework to use (DOCA/DPDK) and whether acceleration should be provided by the BlueField RXP hardware accelerator or the Hyperscan software library. Each option is provided with a short (i.e. doca) or long (i.e. doca_regex) version:
--Regex-dev regex_dpdk --Regex-dev rxp --Regex-dev hyperscan --Regex-dev hs
RXPBench can receive data from various input sources. This option allows you to provide which method you require:
--input-mode dpdk_port --dpdp-primary-port X --dpdk-secondary-port Y --input-mode pcap_file --input-mode text_file --input-mode job_format
The DPDK port option enables RXPBench to receive live traffic from a port, specified in the
--dpdk-primary-port. If the secondary port option exists (
--dpdk-second-port) then any packets received, after pattern matching has occurred, are transmitted onto the second port.
See section DPDK Port Operations for more information.
This option allows you to supply an external PCAP file. This allows for reproducible results using a known input file. The entire payload recorded in each frame within the pcap file is made available to RXPBench.
If processing of a standard text file is required, this option allows you to select any file. The entire text file contents are made available to the RXPBench application with no parsing or changes made.
This option is used to provide a specific "job format" directory to RXPBench. This directory contains files, provided by NVIDIA or through your NVIDIA Networking Support representative, that include data and results to validate that matches returned by the algorithms are expected.
This allows RXPBench to validate that all aspects of the hardware, libraries, and software are operating correctly. In normal operation, this mode is not used, but information on this mode is provided for your reference.
RXPBench supports the ability to scan remote memory through the
remote_mmap input mode. For clarity, remote means that while RXPBench runs directly within the DPU, the data being scanned by the RegEx engine resides entirely in memory on the host.
To facilitate this operation, a companion application,
doca_remote_memory_app, is executed on the host. This companion application allows you to load an input file into memory (to be scanned). It then provides an output file (the export definition). This file,
mmap_export.def, should be transferred to the DPU. This file is then used as the input file parameter,
-f , for RXPBench when using
remote_mmap input mode.
This application is provided (with the installation of RXPBench) to export an area of host memory filled with a specific input file. The following parameters allow you to control the application's execution.
||This is the PCIe address of the RegEx acceleration on the DPU|
||Path to the file whose contents are loaded into memory (maximum 2GB)|
||Path to mmap export file used to remotely access to memory|
In normal operation,
doca_remote_memory_app allocates memory and populates this with the input file as provided by the
--input_file parameter. It then generates a
mmap_export.def file at the location provided by the
--export_file flag and then sleeps waiting on a CTRL+C or kill signal to exit.
This application must continue to execute until RXPBench has completed scanning. Otherwise, the memory (and therefore data to be scanned) is released and made unavailable.
The RXP hardware accelerator can accept regular expressions that have been externally compiled using the RXPC (RXP Compiler) into a ROFF file. This option allows you to specify this ROF2 file.
RXPBench can accept an input file containing raw regular expressions. The uncompiled rules file can be in either of these formats:
- RXP rules file
- Hyperscan rules file
The tool can accept either format of rules file, regardless of which algorithm (BlueField-2 RXP, or Intel Hyperscan) is used. In the case where a rules file is not in the expected format for the algorithm, a conversion process is employed to ensure they operate correctly.
In this configuration, the BlueField-2 RXP compiler is configured with its default optimizations; enhanced performance can be obtained through the adjustment of these parameters. For more information see the NVIDIA RXP Compiler Tool Guide, and provide any compiled rules through the
This option will cause RXPBench to extract the upper-layer data from the received packets and submit them for regular expression testing. Upper-layer data includes data found in TCP and UDP streams found in IPv4 and IPv6 packets (including any such data contained within VLAN tagged packets).
This is the port where packets will be received from. The supplied ID is used directly to access the requested DPDK port.
RXPBench can be used as a "bump" in the wire, where received packets are pattern matched before transmission through a secondary port. This option provides the port ID, as used by DPDK, for the onwards transmission of scanned packets.
RXPBench can accept both uncompiled and compiled rules. As part of the initialization process, any uncompiled regular expression rules must be compiled into object code that can be executed on the BlueField RXP hardware accelerator or Intel™ Hyperscan software library.
While the BlueField RXP supports a wide range of RegEx constructs, both itself and Hyperscan cannot provide for all constructs due to complexity and performance impacts.
When a supplied set of regular expressions is compiled, either algorithm may abort the compilation due to the inclusion of one or more unsupported rule constructs. This option prevents the compilers from aborting, and forces RXPBench to continue with the rules that successfully compiled.
This option activates single-line mode so any new line characters will not match.
This option activates caseless matching which causes all rules to be seen as case insensitive.
This option controls how anchors are handled in regular expressions. If enabled, anchoring is applied per line.
This option activates free spacing mode. Effectively, the white spaces in rules are ignored.
This option sets the time in seconds that a test ought to be run for. If a file is used as input and no
-n are set then the file is looped over until the time period is met.
If input data is being received from either a PCAP File or text file, this option is used to limit the execute to a complete number of iterations of the input file. For example, if an iteration count of 4 was given on a PCAP file contains 1,000 packets. The total number of packets processed would be 4,000. If the input file was a standard text file containing 5,000 bytes of information, an iteration count of 4 would mean 20,000 bytes would be read by RXPBench.
If iterations are used along with a
runtime-seconds option, the test will finish with whatever limit comes first.
This option sets the total number of packets that are read from the selected input mode. After this number of packets is read from a file or received from a network port, rxpbench will complete.
Regardless of the input mode this is the total number of bytes received that is required to mark the execution as complete.
For live traffic this is the total number of bytes received from the physical port. For both the PCAP input file and text file this is the total bytes to read from the input files.
When the RXPBench is reading from input files (whether PCAP, or text files) it has all the information readily available (unlike live traffic which must be received). This allows the application to read a variable amount of input data per iteration.
This option controls the amount of data that is read from the input file and passed to the RegEx algorithm.
With PCAP capture files this option may result in data be transmitted from part of packet, or alternatively multiple packets, if the buffer length supplied is less than or greater than the PCAP’s frame length.
When live traffic is being received from a physical port, this option specifies the received packets minimum size before it will be processed.
For example, setting this value to 256 bytes means that if a packet arrives that is less than 256 bytes in length it will not be processed by RXPBench.
Packets that are dropped by this threshold are recorded in the statistics under “UNDER THRES” field.
When the input is being read from files (either PCAP, or text files) this option allows a certain number of bytes to be overlapped from the previous frame.
Most high-performance applications obtain additional performance by batching together multiple operations into a single process.
DPDK Regex provides the capability of enqueuing multiple buffers to the BlueField-2 RXP Hardware accelerator. This option allows you to specify how many payloads should be grouped together before enqueuing on the hardware.
If receiving packets from a physical port this also determines the batch size to read (and write) to the network ports.
If this option is not supplied, RXPBench defaults to grouping (batching) together 64 packets at a time.
This option will process each received payload packet and identify any layer 5 to layer 7 information present in them. It will then send only this layer 5 to layer 7 data to the RegEx algorithm.
For example, if a 500-byte packet is received that contains 60 bytes of layer 1 to layer 4 data, then the first 60 bytes are ignored and the 440 bytes of layer 5 to layer 7 data is sent to the RegEx algorithm.
For PCAP-based input files, any
--buf-length) option will be overwritten and lengths will be assigned on a per-packet basis. Similarly, for live traffic received from a physical port, each packet is processed independently with data from their layer 5 through 7 being sent to the RegEx algorithm. It may be appropriate to use the threshold option (
-t) to remove small payloads.
Using this option in live mode may increase the average job size due to the skipping of certain “no payload” frames (such as TCP ACKs) that would otherwise be included.
The BlueField RegEx engine accepts jobs up to a maximum length of 16KB. Afterwards it rejects the job as invalid. This DOCA-only feature provides user the ability to supply huge jobs (up-to 2GB) in length.
To enable this option, provide the
-w option with an optional positive integer argument that defines the size of the window to use. This window size can be in the range of 0 to 16383 (default is 32).
Internally, the DOCA framework fragments the job into smaller buffers that can be accepted by the hardware and then reassembles the results of the fragmented searches into a single result (the framework takes care of pruning any duplicate results).
The window is effectively the number of bytes appended to the start of a job, which belong to the end of the previous job fragment. This "window" effectively moves forward through data looking for matches within it.
A match that is up to sliding-window-size bytes long is guaranteed to be found. Any match longer than the window size may be missed if it happens to appear across the boundary of two fragments. Therefore, correct selection of the sliding window size is paramount.
Please note this mode has a performance impact as some job data may need to be scanned twice. Therefore, it is recommended that you use the smallest possible window size necessary for your case.
RXPBench provides a latency figure as part of the "RXP Stats" section. This latency figure can be calculated in two different ways depending on your requirements.
By default, when RXPBench executes, it keeps the RXP hardware queue filled with as much data as possible. This provides the maximum performance but does not provide a true representation of the hardware latency in the calculated statistics.
To see the actual latency of the RXP hardware, you must enable latency mode. Once enabled, this mode batches together packets (either 64 packets, or a user-supplied value using the
-g batching options) and then waits on the results of that batch before calculating the latency.
Using this method, the latency returned and displayed shows a truer representation of the hardware latency offered by the RXP offload engine which can be compared to that of Hyperscan.
Hyperscan does not support both flags being enabled at the same time.
The Hyperscan algorithm provides an option called "HS_FLAG_SINGLEMATCH". Please see the Hyperscan documentation for more information.
The Hyperscan algorithm provides an option called "HS_FLAG_SOM_LEFTMOST", please see the Hyperscan documentation for more information.
RXPBench utilizes the DPDK framework to provide packet operations and hardware-accelerated regular expression (RegEx) offloading (
RXPBench can run in the following input modes: Port, PCAP, or text file.
- In port mode, live traffic is received from a DPDK port to receive live traffic from a port specified in the
--dpdk-primary-portconfiguration option. If the secondary port option exists (
--dpdk-second-port), then any packet received, after pattern matching has occurred, is transmitted onto the second port.
- In PCAP mode, traffic is supplied via an external PCAP file. This allows for reproducible results using a known input file. The entire payload recorded in each frame within the PCAP file is made available to RXPBench.
- Text file mode allows the user to select any file when processing of a standard text file is required. The entire text file contents are made available to the RXPBench application with no parsing or changes made.
To run RXPBench on BlueField, follow these steps:
- Refer to the NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField related software.
The RXPBench tool is supplied in both binary and source package formats as described earlier in this document.
- Before executing RXPBench, an installation of Hyperscan must be present on the host. Hyperscan can be obtained from the Linux distribution package manager (apt, dpkg, yum, etc.) or alternatively compiled from the source. Depending on the Linux distribution on the host, the following Hyperscan versions are required:
Host Linux Distribution Hyperscan Version Installation Command Ubuntu 18.04 4
apt install libhyperscan4
apt install libhyperscan5
CentOS 7.x 5 Hyperscan is provided through 3rd party vendors. The following command will install Hyperscan 5.3.0 on CentOS 7:
yum install epel-release sudo yum install http://repo.openfusion.net/centos7-x86_64/hyperscan-5.3.0-1.of.el7.x86_64.rpm
CentOS 8.x 5 Hyperscan is provided through 3rd party vendors. The following command installs Hyperscan 5.3.0 on CentOS 8:
yum install epel-release sudo yum install https://download-ib01.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/h/hyperscan-5.3.0-5.el8.x86_64.rpm
- Build the RXPBench tool from the source code. RXPBench source code packages are found in the following locations:
- Ubuntu 22.04 – /usr/share/doca-host-repo-ubuntu2204-2.2.0/repo/pool/
- Ubuntu 20.04 – /usr/share/doca-host-repo-ubuntu2004-2.2.0/repo/pool/
- Ubuntu 18.04 – /usr/share/doca-host-repo-ubuntu1804-2.2.0/repo/pool/
- CentOS 8.2 – /usr/share/doca-host-repo-rhel82-2.2.0/repo/Packages/
- CentOS 7.6 – /usr/share/doca-host-repo-rhel76-2.2.0/repo/Packages/
The source code is unpacked using the following commands for example.
- For Ubuntu/Debian:
dpkg-source -x rxpbench_23.04.0.dsc
- For CentOS:
rpmbuild --recompile rxpbench-23.04-1.el7.src.rpm
- To re-build the RXPBench tool. Run:
cd <source extract directory>/rxpbench-23.04.0 make
The RXPBench executable will be located in the build subdirectory.
The build process depends on the
PKG_CONFIG_PATHenvironment variable to locate the DPDK libraries. If the variable was accidently corrupted, and the build fails, please run the following command.
- For Ubuntu/Debian:
- For CentOS:
- For Ubuntu/Debian:
- RXPBench requires the following configurations to enable RegEx.
- On the host side, stop the driver. Run:
host$ sudo /etc/init.d/openibd stop
- Log onto the BlueField-2 and enable host access to the RegEx engine by running the following command:
dpu$ echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/regex/pf/regex_en
- Verify that the service is running. Run:
dpu$ systemctl status mlx-regex
- On the host, start the driver and add hugepages. Run:
host$ sudo /etc/init.d/openibd start host$ sudo echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
- On the host side, stop the driver. Run:
- To run the application:
cd build ./rxpbench [dpdk_flags] -- [additional application flags]
./rxpbench -D "-l 0,1,2,3 -n 1 -a 5e:00.0,class=regex –file-prefix=rxpbench -a 5e:00.01" --input-mode text_file -f ../Shakespeare.txt -d rxp -R ../rules.hs -l 2048 -n 10000 -c 1
- This command runs in the text file input mode (
- The input file is Shakeseare.txt (
- This command uses the RXP device for pattern matching (
- The RXP device is programmed with the rules specified in
- This command sends 2048 bytes of data to be searched in each job (
- This command reads and processes the input text file 10,000 times (
- This command uses 1 CPU core during the run (
Information on the complete set of configuration settings and options may be found in other sections of this document.
As RXPBench executes, statistics will be updated on screen periodically. On exit, summary information will be displayed on screen.
- This command runs in the text file input mode (
- An open-source text file containing the works of William Shakespeare
- A PCAP of captured HTTP/port 80 traffic (closed source),
- A PCAP taken from the National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC)
5 different RegEx rulesets are selected:
- The l7-filter application recognition rules
- The well-known open-source snort_pcres, snort_literals, and teakettle_2500
- A selection of Web Application Firewall RegEx rules taken from the OWASP core-ruleset
All rules are compiled using the RXP Compiler with default options and the addition of some HTTP keywords as a graylist option. The table below shows the number of rules compiled for each ruleset. Uncompiled rules either contain unsupported PCRE syntax or are considered as "bad" or potential DoS rules by the compiler.
|Ruleset||Rules Compiled/Total Rules|
The rules are run against the different datasets in job lengths of 2KB on a single core of both an x86 host and Arm on the BlueField-2 DPU (model MBF2H516A-EEEOT). The following command line is an example of the RXPBench parameters used in the tests:
rxpbench -D "-l5,6 -n 1 -a 03:00.0,class=regex" --input-mode text_file -f Shakespeare.txt -d rxp -r snort_pcre.rof2.binary -c 1 -s 10 -l 2048
The table below presents the performance results achieved for the different datasets on both Arm and the x86 host. All results are in Gb/s.
The results show that 50Gb/s pattern matching throughput can be achieved when applying complex regular expression rulesets to various datasets. Some of the ruleset/dataset combinations show performance below the maximum RXP bandwidth. This is down to a combination of complex rules that require a lot of processing and data that contains a lot of matches or partial matches.
For example, the Owasp-waf rules are known to contain a lot of common English language words which are followed by a "dot star". This means that, when applied to English language data, a lot of extra processing is required to validate full matches. Our tests show that software algorithms are impacted by similar scenarios and, while the RXP throughput is well below line rate, it still offers a significant performance boost over software.
The throughput reported by RXPBench when run on both the x86 host and the Arm is approximately the same in almost all cases. This highlights the benefit of the offload engine in that the power of the CPU used for applications has a limited effect on the pattern matching capabilities.
It is only the snort_literals ruleset that has taken a performance hit. Here, the rules produce many matches. This means that more effort is required by the CPU to process the results. Adding a second BlueField-2 Arm core to RXPBench pushes the performance to the same levels achieved as the x86 host.
The NVIDIA® BlueField®-3 DPU has increased regular expression processing capabilities both in terms of more powerful hardware engines and support for compiler features such as enhanced prefix extraction.
The open-source benchmarking repository for automata-based processing, ANMLZoo, can be used to demonstrate the expected performance of the BlueField-3 DPU using RXPBench. The ANMLZoo repository supplies both sets of regular expressions and corresponding data to be matched against. Four rulesets are selected which correspond to potential pattern matching use cases or benchmarking standards for a DPU. These include:
- ClamAv – a set of RegEx rules designed to detect virus signatures
- Dotstar – RegEx rules generated by a tool used in the evaluation of GPU expression processing
- PowerEn – rules from a developed RegEx benchmarking suite
- Snort – a snapshot of expressions used in IDS/IPS systems
All rules are compiled using the RXP Compiler with default options.
The following table shows the number of rules compiled for each ruleset. Rules that could not be compiled either contain PCRE syntax unsupported by RXPC, require more resources than the hardware can offer, or are considered "bad" by the compiler (e.g., very short rules that are not considered applicable to hardware offload).
The rules are run against their associated datasets split into job lengths of 2KB. All tests are run on a single core on both an x86 host and Arm cores on the BlueField-3 DPU (model 9009D3B600CVAA). The following command line is an example of the RXPBench parameters used in the tests:
rxpbench -D "-l5,6 -n 1 -a 03:00.0,class=regex" --input-mode text_file -f vasim_1MB.input -d doca -r clam_av.rof2.binary -c 1 -s 10 -l 2048
The following table presents the performance results achieved when the rules are applied to their associated datasets (also supplied by the ANMLZoo repo) on both Arm and the x86 host. Results are reported in Gb/s.
The results reveal that real-world regular expression rulesets such as Snort can be processed at speeds close to 75 Gb/s on the BlueField-3 DPU. 75 Gb/s is the theoretical upper limit of the regular expression offload engine that can be achieved using random data with no matches. In reality, specifically crafted data may be able to push throughput even higher.
Rulesets such as PowerEn and Snort produce a lot of matches when applied to their given ruleset but throughput remains high regardless.
The ClamAV rules and dataset are quite complex as they contain a lot of "partial matches" which is to say that there are many sections of the data that match a part of a given rule so extra processing is required to verify if a full match is present or not. Experiments on alternative software-based pattern matchers show that they too suffer a reduction in performance when processing these rulesets. The BlueField-3 result of close to 25 Gb/s is still significantly higher performance than what software can achieve using a reasonable number of CPU cores.
It should also be noted that the throughput reported by RXPBench when run on both the x86 host and the Arm is approximately the same in almost all cases. This highlights the benefit of the offload engine in that the power of the CPU used for applications has a limited effect on the pattern-matching capabilities.
The following graph plots the throughput achieved on the Snort test as the job length used is increased from 64 bytes to 4K. The experiment is run using 4 Arm cores on the DPU to negate any software overhead that might occur when generating small packets.
The graph indicates that job lengths of 64 bytes can be processed by the BlueField-3 engine at speeds of greater than 8 Gb/s. When job lengths reach approximately 512 bytes or above, the maximum rate for the given rules and data is achieved.
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assume no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of Mellanox Technologies Ltd. and/or NVIDIA Corporation in the U.S. and in other countries. The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the mark on a world¬wide basis. Other company and product names may be trademarks of the respective companies with which they are associated.
© 2023 NVIDIA Corporation & affiliates. All rights reserved.