DOCA Compress
This guide provides instructions on how to use the DOCA Compress API.
DOCA Compress library provides an API to compress and decompress data using hardware acceleration, supporting both host and NVIDIA® BlueField® DPU memory regions.
The library provides an API for executing compress operations on DOCA buffers, where these buffers reside in either the DPU memory or host memory.
Using DOCA Compress, compress and decompress memory operations can be easily executed in an optimized, hardware-accelerated manner.
This document is intended for software developers wishing to accelerate their application's compress memory operations.
The DOCA Compress library follows the architecture of a DOCA Core Context. It is recommended to read the following sections before proceeding:
DOCA Compress-based applications can run either on the host machine or on the BlueField DPU target.
Compress can only be run with a DPU configured with DPU mode as described in NVIDIA BlueField DPU Modes of Operation.
DOCA Compress is a DOCA Context as defined by DOCA Core. See NVIDA DOCA Core Context for more information.
DOCA Compress leverages DOCA Core architecture to expose asynchronous tasks that are offloaded to hardware.
Supported Compress/Decompress Algorithms
For BlueField-2 devices, this library supports:
Compress operation using the deflate algorithm
Decompress operation using the deflate algorithm
For BlueField-3 devices, this library supports:
Decompress operation using the deflate algorithm
Decompress operation using the LZ4 algorithm
Alder and CRC
All compress/decompress tasks produce Adler and CRC.
Objects
Device and Device Representor
The library requires a DOCA device to operate, the device is used to access memory and perform the actual copy. See DOCA Core Device Discovery for information.
For same BlueField DPU, it does not matter which device is used (PF/VF/SF), as all these devices utilize the same hardware component. If there are multiple DPUs, it is possible to create a Compress instance per DPU, providing each instance with a device from a different DPU.
To access memory that is not local (from the host to the DPU or vice versa), then the DPU side of the application must pick a device with an appropriate representor. See DOCA Core Device Representor Discovery.
The device must stay valid as long as the Compress instance is not destroyed.
Memory Buffers
All compress/decompress tasks require two DOCA buffers containing the destination and the source. Depending on the allocation pattern of the buffers, refer to the Inventory Types table.
Buffers must not be modified or read during the compress/decompress operation.
Source and Destination Location
DOCA Compress can process DOCA buffers that reside on the host, the DPU, or both.
Local Host
Source and destination buffers reside on the host and the compress library runs on the host.
Local DPU
Source and destination buffers reside on the DPU and the compress library runs on the DPU.
Remote
Source at Host, Destination at DPU
The source resides on the host and is exported (DOCA mmap export) to the DPU
The destination resides on the DPU
The compress library runs on the DPU and compresses/decompresses the host source to the DPU destination
Source at DPU, Destination at Host
The source resides on the DPU
The destination resides on the host and is exported (DOCA mmap export) to the DPU
Compress library runs on the DPU and compresses/decompresses the DPU source to the host destination
To start using the library, the user must go through a configuration phase as described in DOCA Core Context Configuration Phase.
This section describes how to configure and start the context, to allow execution of tasks and retrieval of events.
Configurations
The context can be configured to match the use case of the application.
To find if a configuration is supported or what its min/max value is, refer to Device Support.
Mandatory Configurations
The following configurations must be set by the application before attempting to start the context:
At least one task/event type must be configured. See configuration of Tasks.
A device with appropriate support must be provided upon creation
Device Support
DOCA Compress requires a device to operate. To pick a device, see DOCA Core Device Discovery.
As device capabilities may change in the future (see DOCA Core Device Support), it is recommended to select your device using the following APIs:
Supported Tasks
doca_compress_cap_task_compress_deflate_is_supported
doca_compress_cap_task_decompress_deflate_is_supported
doca_compress_cap_task_decompress_lz4_is_supported
Supported Buffer Size
doca_compress_cap_task_compress_deflate_get_max_buf_size
doca_compress_cap_task_decompress_deflate_get_max_buf_size
doca_compress_cap_task_decompress_lz4_get_max_buf_size
Buffer Support
Tasks support buffers with the following features:
Buffer Type |
Source Buffer |
Destination Buffer |
Linked List Buffer |
Yes |
No |
Local mmap Buffer |
Yes |
Yes |
mmap From PCI Export Buffer |
Yes |
Yes |
mmap From RDMA Export Buffer |
No |
No |
This section describes execution on CPU or DPU using DOCA Core Progress Engine.
Tasks
Compress Deflate Task
This task facilitates compressing memory using buffers as described in section "Buffer Support".
Configuration
Description |
API to set the configuration |
API to query support |
Enable the task |
doca_compress_task_compress_deflate_set_conf |
doca_compress_cap_task_compress_deflate_is_supported |
Number of tasks |
doca_compress_task_compress_deflate_set_conf |
doca_compress_get_max_num_tasks (max total num tasks) |
Maximal buffer size |
– |
doca_compress_cap_task_compress_deflate_get_max_buf_size |
Maximum buffer list size |
– |
doca_compress_cap_task_compress_deflate_get_max_buf_list_len |
Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
Source buffer |
Buffer pointing to the memory to be compressed |
Only the data residing in the data segment is compressed |
Destination buffer |
Buffer pointing to where compressed memory will be stored |
The data is compressed to the tail segment extending the data segment |
Output
Common output as described in DOCA Core Task.
Task Successful Completion
After the task completes successfully, the following happens:
The source data is compressed to destination
The destination buffer data segment is extended to include the compressed data
Adler can be retrieved by calling doca_compress_task_compress_deflate_get_adler_cs
CRC can be retrieved by calling doca_compress_task_compress_deflate_get_crc_cs
Task Failed Completion
If the task fails midway:
The context may enter stopping state if a fatal error occurs
The source and destination doca_buf objects are not modified
The destination buffer contents may be modified
Limitations
The operation is not atomic
Once the task has been submitted, the source and destination should not be read/written to
Source and destination must not overlap
Other limitations are described in DOCA Core Task
Decompress Deflate Task
This task facilitates decompressing memory using buffers as described in section "Buffer Support".
Configuration
Description |
API to Set the Configuration |
API to Query Support |
Enable the task |
doca_compress_task_decompress_deflate_set_conf |
doca_compress_cap_task_decompress_deflate_is_supported |
Number of tasks |
doca_compress_task_decompress_deflate_set_conf |
doca_compress_get_max_num_tasks (max-total-num-tasks) |
Maximal buffer size |
– |
doca_compress_cap_task_decompress_deflate_get_max_buf_size |
Maximum buffer list size |
– |
doca_compress_cap_task_decompress_deflate_get_max_buf_list_len |
Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
source buffer |
Buffer pointing to the memory to be decompressed |
Only the data residing in the data segment is decompressed |
destination buffer |
Buffer pointing to where decompressed memory will be stored |
The data is decompressed to the tail segment extending the data segment |
Output
Common output as described in DOCA Core Task.
Task Successful Completion
After the task completes successfully, the following happens:
The source data is decompressed to destination
The destination buffer data segment is extended to include the decompressed data
Adler can be retrieved by calling doca_compress_task_decompress_deflate_get_adler_cs
CRC can be retrieved by calling doca_compress_task_decompress_deflate_get_crc_cs
Task Failed Completion
If the task fails midway:
The context may enter stopping state if a fatal error occurs
The source and destination doca_buf objects are not modified
The destination buffer contents may be modified
Limitations
The operation is not atomic
Once the task has been submitted, the source and destination should not be read/written to
Source and destination must not overlap
Other limitations are described in DOCA Core Task
Decompress LZ4 Task
This task facilitates decompressing memory using buffers as described in section "Buffer Support".
Configuration
Description |
API to Set the Configuration |
API to Query Support |
Enable the task |
doca_compress_task_decompress_lz4_set_conf |
doca_compress_cap_task_decompress_lz4_is_supported |
Number of tasks |
doca_compress_task_decompress_lz4_set_conf |
doca_compress_get_max_num_tasks (max total num tasks) |
Maximal buffer size |
– |
doca_compress_cap_task_decompress_lz4_get_max_buf_size |
Maximum buffer list size |
– |
doca_compress_cap_task_decompress_lz4_get_max_buf_list_len |
Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
Source buffer |
Buffer pointing to the memory to be decompressed |
Only the data residing in the data segment will be decompressed |
Destination buffer |
Buffer pointing to where decompressed memory will be stored |
The data is decompressed to the tail segment extending the data segment |
Output
Common output as described in DOCA Core Task.
Task Successful Completion
After the task completes successfully:
The source data is decompressed to destination
The destination buffer data segment is extended to include the decompressed data
Adler can be retrieved by calling doca_compress_task_decompress_lz4_get_adler_cs
CRC can be retrieved by calling doca_compress_task_decompress_lz4_get_crc_cs
Task Failed Completion
If the task fails midway:
The context may enter stopping state if a fatal error occurs
The source and destination doca_buf objects are not modified
The destination buffer contents may be modified
Limitations
The operation is not atomic
Once the task has been submitted, the source and destination should not be read/written to
Source and destination must not overlap
Other limitations are described in DOCA Core Task
When using an LZ4 operation, the source buffer must be from local memory.
Events
DOCA Compress exposes asynchronous events to notify about changes that happen unexpectedly according to DOCA Core architecture.
The only events DOCA Compress expose are common events (doca ctx state changed). See more info in DOCA Core Event.
The DOCA Compress library follows the Context state machine described in DOCA Core Context State Machine.
This section describes how to move states and what is allowed in each state.
States
Idle
In this state, it is expected that application:
Destroys the context
Starts the context
Allowed operations:
Configuring the context according to Configurations
Starting the context
It is possible to reach this state as follows:
Previous State |
Transition Action |
None |
Create the context |
Running |
Call stop after making sure all tasks have been freed |
Stopping |
Call progress until all tasks are completed and freed |
Starting
This state cannot be reached.
Running
In this state, it is expected that application:
Allocates and submit tasks
Calls progress to complete tasks and/or receive events
Allowed operations:
Allocate previously configured task
Submit a task
Call stop
It is possible to reach this state as follows:
Previous State |
Transition Action |
Idle |
Call start after configuration |
Stopping
In this state, it is expected that application:
Calls progress to complete all inflight tasks (tasks will complete with failure)
Frees any completed tasks
Allowed operations:
Call progress
It is possible to reach this state as follows:
Previous State |
Transition Action |
Running |
Call progress and fatal error occurs |
Running |
Call stop without freeing all tasks |
DOCA Compress only supports datapath on CPU, see Execution Phase.
When using an LZ4 operation, the source buffer must be from local memory.
The following samples illustrate how to use the DOCA Compress API to compress and decompress files.
DOCA Compress handles payload only. To create a compressed file, (e.g., gzip), the developer must add a gzip header. To decompress a file, (e.g. gzip) developer has to strip the header.
Running the Sample
Refer to the following documents:
NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.
NVIDIA DOCA Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
To build a given sample:
cd
/opt/mellanox/doca/samples/doca_compress/<sample_name> meson /tmp/build ninja -C /tmp/buildInfoThe binary doca_<sample_name> is created under /tmp/build/.
Sample (e.g., doca_compress_deflate) usage:
Usage: doca_compress_deflate [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -
v
, --version Print program version information -l, --log-level Set the (numeric) log levelfor
the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log levelfor
the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse allcommand
flags from an input jsonfile
Program Flags: -p, --pci-addr DOCA device PCI device address -f, --file
inputfile
to compress/decompress -o, --output outputfile
-c, --output-checksum Output checksumFor additional information per sample, use the -h option:
/tmp/build/doca_<sample_name> -h
Samples
Compress/Decompress Deflate
This sample illustrates how to use DOCA Compress library to compress or decompress a file.
The sample logic includes:
Locating a DOCA device.
Initializing the required DOCA core structures.
Populating DOCA memory map with two relevant buffers; one for the source data and one for the result.
Allocating elements in DOCA buffer inventory for each buffer.
Allocating and initializing DOCA Compress deflate task or DOCA Decompress deflate object.
Submitting a compress task.
Running the progress engine until the task is completed.
Writing the result into an output file, out.txt.
Destroying all compress and DOCA core structures.
References:
/opt/mellanox/doca/samples/doca_compress/compress_deflate/compress_deflate_sample.c
/opt/mellanox/doca/samples/doca_compress/compress_deflate/compress_deflate_main.c
/opt/mellanox/doca/samples/doca_compress/compress_deflate/meson.build
/opt/mellanox/doca/samples/doca_compress/decompress_deflate/decompress_deflate_sample.c
/opt/mellanox/doca/samples/doca_compress/decompress_deflate/decompress_deflate_main.c
/opt/mellanox/doca/samples/doca_compress/decompress_deflate/meson.build
/opt/mellanox/doca/samples/doca_compress/compress_common.h
/opt/mellanox/doca/samples/doca_compress/compress_common.c
/opt/mellanox/doca/samples/doca_compress/meson.build