DOCA Compress

1.0

This guide provides instructions on how to use the DOCA Compress API.

DOCA Compress library provides an API to compress and decompress data using hardware acceleration, supporting both host and NVIDIA® BlueField® DPU memory regions.

The library provides an API for executing compress operations on DOCA buffers, where these buffers reside in either the DPU memory or host memory.

Using DOCA Compress, compress and decompress memory operations can be easily executed in an optimized, hardware-accelerated manner.

This document is intended for software developers wishing to accelerate their application’s compress memory operations.

The DOCA Compress library follows the architecture of a DOCA Core Context. It is recommended to read the following sections before proceeding:

DOCA Compress-based applications can run either on the host machine or on the BlueField DPU target.

Compress can only be run with a DPU configured with DPU mode as described in NVIDIA BlueField Modes of Operation.

DOCA Compress is a DOCA Context as defined by DOCA Core. See NVIDA DOCA Core Context for more information.

DOCA Compress leverages DOCA Core architecture to expose asynchronous tasks that are offloaded to hardware.

dst-buf-src-buf-version-1-modificationdate-1712750270287-api-v2.png

Supported Compress/Decompress Algorithms

For BlueField-2 devices, this library supports:

  • Compress operation using the deflate algorithm

  • Decompress operation using the deflate algorithm

For BlueField-3 devices, this library supports:

  • Decompress operation using the deflate algorithm

  • Decompress operation using the LZ4 algorithm

Supported Checksum Methods

Depending on the task type, the following checksum methods are produced and may be retrieved using the relevant getter functions:

  • Adler – produced by the deflate compress and decompress tasks, as well as the LZ4 decompress task

  • CRC – produced by all tasks

  • xxHash – produced by the LZ4 stream and block decompress tasks

Refer to “Tasks” section for more information.

Objects

Device and Device Representor

The library requires a DOCA device to operate, the device is used to access memory and perform the actual copy. See DOCA Core Device Discovery for information.

For same BlueField DPU, it does not matter which device is used (PF/VF/SF), as all these devices utilize the same hardware component. If there are multiple DPUs, it is possible to create a Compress instance per DPU, providing each instance with a device from a different DPU.

To access memory that is not local (from the host to the DPU or vice versa), then the DPU side of the application must pick a device with an appropriate representor. See DOCA Core Device Representor Discovery.

The device must stay valid as long as the Compress instance is not destroyed.

Memory Buffers

All compress/decompress tasks require two DOCA buffers containing the destination and the source. Depending on the allocation pattern of the buffers, refer to the Inventory Types table.

Buffers must not be modified or read during the compress/decompress operation.

Source and Destination Location

DOCA Compress can process DOCA buffers that reside on the host, the DPU, or both.

Local Host

Source and destination buffers reside on the host and the compress library runs on the host.

Local DPU

Source and destination buffers reside on the DPU and the compress library runs on the DPU.

Remote

Source at Host, Destination at DPU

  • The source resides on the host and is exported (DOCA mmap export) to the DPU

  • The destination resides on the DPU

  • The compress library runs on the DPU and compresses/decompresses the host source to the DPU destination

Source at DPU, Destination at Host

  • The source resides on the DPU

  • The destination resides on the host and is exported (DOCA mmap export) to the DPU

  • Compress library runs on the DPU and compresses/decompresses the DPU source to the host destination

To start using the library, the user must go through a configuration phase as described in DOCA Core Context Configuration Phase.

This section describes how to configure and start the context, to allow execution of tasks and retrieval of events.

Configurations

The context can be configured to match the use case of the application.

To find if a configuration is supported or what its min/max value is, refer to Device Support.

Mandatory Configurations

The following configurations must be set by the application before attempting to start the context:

  • At least one task/event type must be configured. See configuration of Tasks.

  • A device with appropriate support must be provided upon creation

Device Support

DOCA Compress requires a device to operate. To pick a device, see DOCA Core Device Discovery.

As device capabilities may change in the future (see DOCA Core Device Support), it is recommended to select your device using the following APIs:

Supported Tasks

  • doca_compress_cap_task_compress_deflate_is_supported

  • doca_compress_cap_task_decompress_deflate_is_supported

  • doca_compress_cap_task_decompress_lz4_is_supported

  • doca_compress_cap_task_decompress_lz4_stream_is_supported

  • doca_compress_cap_task_decompress_lz4_block_is_supported

Supported Buffer Size

  • doca_compress_cap_task_compress_deflate_get_max_buf_size

  • doca_compress_cap_task_decompress_deflate_get_max_buf_size

  • doca_compress_cap_task_decompress_lz4_get_max_buf_size

  • doca_compress_cap_task_decompress_lz4_stream_get_max_buf_size

  • doca_compress_cap_task_decompress_lz4_block_get_max_buf_size

Buffer Support

Tasks support buffers with the following features:

Buffer Type

Source Buffer

Destination Buffer

Linked List Buffer

Yes

No

Local mmap Buffer

Yes

Yes

mmap From PCI Export Buffer

Yes

Yes

mmap From RDMA Export Buffer

No

No


This section describes execution on CPU or DPU using DOCA Core Progress Engine.

Tasks

Compress Deflate Task

This task facilitates compressing memory, with the deflate algorithm, using buffers as described in section “Buffer Support”.

Note

DOCA compress returns only the payload. To create a compressed file, (e.g., gzip), the developer must add a gzip header/trailer.

Configuration

Description

API to set the configuration

API to query support

Enable the task

doca_compress_task_compress_deflate_set_conf

doca_compress_cap_task_compress_deflate_is_supported

Number of tasks

doca_compress_task_compress_deflate_set_conf

doca_compress_get_max_num_tasks (max total num tasks)

Maximal buffer size

doca_compress_cap_task_compress_deflate_get_max_buf_size

Maximum buffer list size

doca_compress_cap_task_compress_deflate_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

Source buffer

Buffer pointing to the memory to be compressed

Only the data residing in the data segment is compressed

Destination buffer

Buffer pointing to where compressed memory will be stored

The data is compressed to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task completes successfully, the following happens:

  • The source data is compressed to destination

  • The destination buffer data segment is extended to include the compressed data

  • Adler can be retrieved by calling doca_compress_task_compress_deflate_get_adler_cs

  • CRC can be retrieved by calling doca_compress_task_compress_deflate_get_crc_cs

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Decompress Deflate Task

This task facilitates decompressing memory, with the deflate algorithm, using buffers as described in section “Buffer Support”.

Note

DOCA decompress expects the payload alone. To decompress a file (e.g. gzip), the developer must strip the header/trailer.

Configuration

Description

API to Set the Configuration

API to Query Support

Enable the task

doca_compress_task_decompress_deflate_set_conf

doca_compress_cap_task_decompress_deflate_is_supported

Number of tasks

doca_compress_task_decompress_deflate_set_conf

doca_compress_get_max_num_tasks (max-total-num-tasks)

Maximal buffer size

doca_compress_cap_task_decompress_deflate_get_max_buf_size

Maximum buffer list size

doca_compress_cap_task_decompress_deflate_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

source buffer

Buffer pointing to the memory to be decompressed

Only the data residing in the data segment is decompressed

destination buffer

Buffer pointing to where decompressed memory will be stored

The data is decompressed to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task completes successfully, the following happens:

  • The source data is decompressed to destination

  • The destination buffer data segment is extended to include the decompressed data

  • Adler can be retrieved by calling doca_compress_task_decompress_deflate_get_adler_cs

  • CRC can be retrieved by calling doca_compress_task_decompress_deflate_get_crc_cs

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Decompress LZ4 Tasks

These tasks facilitate decompressing memory with the LZ4 algorithm, using buffers as described in section “Buffer Support”, with LZ4.

The main differences between the tasks are:

  • The input data format –

    • The decompress LZ4 task expects the input data to be a full LZ4 frame

    • The decompress LZ4 stream task expects a stream of one or more blocks, without the frame (i.e., the magic number, frame descriptor, and content checksum)

    • The decompress LZ4 block task expects a single, compressed, data-only block (i.e., without block size or block checksum)

  • Support for remote buffers – in the decompress LZ4 task, the source buffer must be from local memory, whereas the stream and block tasks do not have this limitation

Decompress LZ4 Task

Note

This task type is deprecated and will be removed in a future DOCA release.

This task facilitates decompressing memory using buffers as described in section “Buffer Support”.

Configuration

Description

API to Set the Configuration

API to Query Support

Enable the task

doca_compress_task_decompress_lz4_set_conf

doca_compress_cap_task_decompress_lz4_is_supported

Number of tasks

doca_compress_task_decompress_lz4_set_conf

doca_compress_get_max_num_tasks (max total num tasks)

Maximal buffer size

doca_compress_cap_task_decompress_lz4_get_max_buf_size

Maximum buffer list size

doca_compress_cap_task_decompress_lz4_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

Source buffer

Buffer pointing to the memory to be decompressed

Only the data residing in the data segment will be decompressed

Destination buffer

Buffer pointing to where decompressed memory will be stored

The data is decompressed to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task completes successfully:

  • The source data is decompressed to destination

  • The destination buffer data segment is extended to include the decompressed data

  • Adler can be retrieved by calling doca_compress_task_decompress_lz4_get_adler_cs

  • CRC can be retrieved by calling doca_compress_task_decompress_lz4_get_crc_cs

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Note

This task supports only a source buffer from local memory.

Decompress LZ4 Stream Task

This task facilitates decompressing memory with the LZ4 algorithm, using buffers as described in section “Buffer Support”.

Note

The decompress LZ4 stream task expects a stream of one or more blocks without the frame (i.e., the magic number, frame descriptor, and content checksum).

Configuration

Description

API to Set the Configuration

API to Query Support

Enable the task

doca_compress_task_decompress_lz4_stream_set_conf

doca_compress_cap_task_decompress_lz4_stream_is_supported

Number of tasks

doca_compress_task_decompress_lz4_stream_set_conf

doca_compress_get_max_num_tasks (max total num tasks)

Maximal buffer size

doca_compress_cap_task_decompress_lz4_stream_get_max_buf_size

Maximum buffer list size

doca_compress_cap_task_decompress_lz4_stream_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

Has block checksum Flag

A flag to indicate whether or not the blocks in the stream have a checksum

1 if the task should expect blocks in the stream to have a checksum; 0 otherwise

Are blocks independent flag

A flag to indicate whether or not each block depends on previous blocks in the stream

1 the the task should expect blocks to be independent; 0 otherwise (dependent blocks)

Source buffer

Buffer pointing to the memory to be decompressed

Only the data residing in the data segment is decompressed

Destination buffer

Buffer pointing to where decompressed memory will be stored

The data is decompressed to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task completes successfully:

  • The source data is decompressed to destination

  • The destination buffer data segment is extended to include the decompressed data

  • CRC can be retrieved by calling doca_compress_task_decompress_lz4_stream_get_crc_cs

  • xxHash can be retrieved by calling doca_compress_task_decompress_lz4_stream_get_xxh_cs

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Decompress LZ4 Block Task

This task facilitates decompressing memory with the LZ4 algorithm, using buffers as described in section “Buffer Support”.

Note

The decompress LZ4 block task expects a single, compressed, data-only block (i.e., without block size or block checksum).

Configuration

Description

API to Set the Configuration

API to Query Support

Enable the task

doca_compress_task_decompress_lz4_block_set_conf

doca_compress_cap_task_decompress_lz4_block_is_supported

Number of tasks

doca_compress_task_decompress_lz4_block_set_conf

doca_compress_get_max_num_tasks (max total num tasks)

Maximal buffer size

doca_compress_cap_task_decompress_lz4_block_get_max_buf_size

Maximum buffer list size

doca_compress_cap_task_decompress_lz4_block_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

Source buffer

Buffer pointing to the memory to be decompressed

Only the data residing in the data segment will be decompressed

Destination buffer

Buffer pointing to where decompressed memory will be stored

The data is decompressed to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task completes successfully:

  • The source data is decompressed to destination

  • The destination buffer data segment is extended to include the decompressed data

  • CRC can be retrieved by calling doca_compress_task_decompress_lz4_block_get_crc_cs

  • xxHash can be retrieved by calling doca_compress_task_decompress_lz4_bloxk_get_xxh_cs

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Events

DOCA Compress exposes asynchronous events to notify about changes that happen unexpectedly according to DOCA Core architecture.

The only events DOCA Compress expose are common events (doca ctx state changed). See more info in DOCA Core Event.

The DOCA Compress library follows the Context state machine described in DOCA Core Context State Machine.

This section describes how to move states and what is allowed in each state.

States

Idle

In this state, it is expected that application:

  • Destroys the context

  • Starts the context

Allowed operations:

  • Configuring the context according to Configurations

  • Starting the context

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Running

Call stop after making sure all tasks have been freed

Stopping

Call progress until all tasks are completed and freed


Starting

This state cannot be reached.

Running

In this state, it is expected that application:

  • Allocates and submit tasks

  • Calls progress to complete tasks and/or receive events

Allowed operations:

  • Allocate previously configured task

  • Submit a task

  • Call stop

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Call start after configuration


Stopping

In this state, it is expected that application:

  • Calls progress to complete all inflight tasks (tasks will complete with failure)

  • Frees any completed tasks

Allowed operations:

  • Call progress

It is possible to reach this state as follows:

Previous State

Transition Action

Running

Call progress and fatal error occurs

Running

Call stop without freeing all tasks

DOCA Compress only supports datapath on CPU, see Execution Phase.

The following samples illustrate how to use the DOCA Compress API to compress and decompress files.

Note

DOCA Compress handles payload only unless the zc flag is used (available only for deflate samples). In that case, a zlib header and trailer are added in compression and it is considered as part of the input when decompressing.

Running the Sample

  1. Refer to the following documents:

  2. To build a given sample:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/samples/doca_compress/<sample_name> meson /tmp/build ninja -C /tmp/build

    Info

    The binary doca_<sample_name> is created under /tmp/build/.

  3. Sample (e.g., doca_compress_deflate) usage:

    • Common arguments

      Copy
      Copied!
                  

      Usage: doca_<sample_name> [DOCA Flags] [Program Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse all command flags from an input json file   Program Flags: -p, --pci-addr DOCA device PCI device address -f, --file Input file to compress/decompress -o, --output Output file -c, --output-checksum Output checksum

    • Sample-specific arguments

      Sample

      Argument

      Description

      Compress/Decompress Deflate

      -zc, -zlib-compatible

      Write/read a file compatible with default zlib settings

      Decompress LZ4 Stream

      -bc, --has-block-checksum

      Flag to indicate if blocks have a checksum

      -bi, --are-blocks-independent

      Flag to indicate if blocks are independent

  4. For additional information per sample, use the -h option:

    Copy
    Copied!
                

    /tmp/build/doca_<sample_name> -h

Samples

Compress/Decompress Deflate

This sample illustrates how to use DOCA Compress library to compress or decompress a file.

The sample logic includes:

  1. Locating a DOCA device.

  2. Initializing the required DOCA Core structures.

  3. Populating DOCA memory map with two relevant buffers; one for the source data and one for the result.

  4. Allocating elements in DOCA buffer inventory for each buffer.

  5. Allocating and initializing a DOCA Compress deflate task or a DOCA Decompress deflate task.

  6. Submitting the task.

  7. Running the progress engine until the task is completed.

  8. Writing the result into an output file, out.txt.

  9. Destroying all DOCA Compress and DOCA Core structures.

References:

  • /opt/mellanox/doca/samples/doca_compress/compress_deflate/compress_deflate_sample.c

  • /opt/mellanox/doca/samples/doca_compress/compress_deflate/compress_deflate_main.c

  • /opt/mellanox/doca/samples/doca_compress/compress_deflate/meson.build

  • /opt/mellanox/doca/samples/doca_compress/decompress_deflate/decompress_deflate_sample.c

  • /opt/mellanox/doca/samples/doca_compress/decompress_deflate/decompress_deflate_main.c

  • /opt/mellanox/doca/samples/doca_compress/decompress_deflate/meson.build

  • /opt/mellanox/doca/samples/doca_compress/compress_common.h

  • /opt/mellanox/doca/samples/doca_compress/compress_common.c

Decompress LZ4 Stream

This sample illustrates how to use DOCA Compress library to decompress a file using the LZ4 stream decompress task.

The sample logic includes:

  1. Locating a DOCA device.

  2. Initializing the required DOCA Core structures.

  3. Populating DOCA memory map with two relevant buffers; one for the source data and one for the result.

  4. Allocating elements in DOCA buffer inventory for each buffer.

  5. Allocating and initializing an DOCA Decompress LZ4 stream task.

  6. Submitting the task.

  7. Running the progress engine until the task is completed.

  8. Writing the result into an output file, out.txt.

  9. Destroying all DOCA Compress and DOCA Core structures.

References:

  • /opt/mellanox/doca/samples/doca_compress/decompress_lz4_stream/decompress_lz4_stream_sample.c

  • /opt/mellanox/doca/samples/doca_compress/decompress_lz4_stream/decompress_lz4_stream_main.c

  • /opt/mellanox/doca/samples/doca_compress/decompress_lz4_stream/meson.build

  • /opt/mellanox/doca/samples/doca_compress/compress_common.h

  • /opt/mellanox/doca/samples/doca_compress/compress_common.c

Backward Compatibility

Decompress LZ4 Task

The d ecompress LZ4 task is deprecated, and will be removed in a future release. It is recommended to use the decompress LZ4 stream task or the decompress LZ4 block task instead.

© Copyright 2024, NVIDIA. Last updated on Feb 9, 2024.