Description Operators#

Description operators define the compression/decompression problem we want to solve at compile time. Combined with execution operators, they form a complete function descriptor that can be executed on a device.

Operator	Options/Arguments	Description
`Direction<direction>`	`nvcompdx::direction::compress`, `nvcompdx::direction::decompress`	Direction of the operation
`Algorithm<algorithm>`	`nvcompdx::algorithm::ans`, `nvcompdx::algorithm::lz4`	Algorithm used
`DataType<type>`	`nvcompdx::datatype::uint8`, `nvcompdx::datatype::uint16`, `nvcompdx::datatype::uint32`, `nvcompdx::datatype::float16`	The unit in which the (de)compressor interprets the data.
`MaxUncompChunkSize<ChunkSize>`	non-zero `ChunkSize`	The maximum uncompressed chunk size to be expected in bytes. Only applicable for compressors.
`SM<CC>`	target architecture `CC`	Target CUDA architecture for which the nvCOMPDx function should be generated.

Operators are added (in arbitrary order) to construct the operation descriptor type. For example, to describe a compression use case with the ANS algorithm, with 32KiB raw chunks, 1-byte data type, using the Ampere architecture, one would write:

#include <nvcompdx.hpp>

using COMP = decltype(Algorithm<algorithm::ans>() +
                      DataType<datatype::uint8>() +
                      Direction<direction::compress>() +
                      MaxUncompChunkSize<32768>() +
                      SM<800>());

Analogously, for the matching decompressor one would write:

#include <nvcompdx.hpp>

using DECOMP = decltype(Algorithm<algorithm::ans>() +
                        DataType<datatype::uint8>() +
                        Direction<direction::decompress>() +
                        SM<800>());

Compressors and decompressors require different sets of description operators to be complete.

For a compressor descriptor to be complete, the following description operators are required:

Exactly one Direction Operator with the option nvcompdx::direction::compression
Exactly one Algorithm Operator
Exactly one Data Type Operator
Exactly one Maximum Uncompressed Chunk Size Operator
Exactly one SM Operator

For a decompressor descriptor to be complete, the following description operators are required:

Exactly one Direction Operator with the option nvcompdx::direction::decompression
Exactly one Algorithm Operator
Exactly one Data Type Operator
Exactly one SM Operator

Direction Operator#

nvcompdx::Direction<nvcompdx::direction D>()

namespace nvcompdx {
  enum class direction {
    compress,
    decompress
  };
}

Sets the direction of operation: compression or decompression. In an application where both directions are necessary, it might be beneficial to set up a base descriptor type and specialize it later with a direction.

using namespace nvcompdx;

// Base type
using BASE = decltype(... + SM<800>());

// Compression description
using COMP = decltype(BASE() + Direction<direction::compress>());

// Decompression description
using DECOMP = decltype(BASE() + Direction<direction::decompress>());

Algorithm Operator#

nvcompdx::Algorithm<nvcompdx::algorithm A>()

namespace nvcompdx {
  enum class algorithm {
    ans,
    lz4,
  };
}

Sets the underlying algorithm for compression and decompression.

nvCOMPDx currently only supports LZ4 and ANS, but these algorithms are interoperable with our main compressor library nvCOMP. Chunks compressed by nvCOMPDx can be decompressed by nvCOMP, and chunks compressed by nvCOMP can be decompressed by nvCOMPDx.

A detailed description of the LZ4 compression scheme is available on the LZ4 compression GitHub page.

Data Type Operator#

nvcompdx::DataType<nvcompdx::datatype DT>()

namespace nvcompdx {
  enum class datatype {
    uint8,
    uint16,
    uint32,
    float16
  };
}

Sets the type of input and output data used during compression and decompression.

uint8: Data to be interpreted as consecutive bytes. If the input data type is not included in the options below, uint8 should be selected.
uint16: Data to be interpreted as consecutive shorts (2 bytes). Requires the total number of input bytes per chunk to be divisible by two.
uint32: Data to be interpreted as consecutive integers (4 bytes). Requires the total number of input bytes per chunk to be divisible by four.
float16: Data to be interpreted as consecutive half-precision floats (2 bytes). Requires the total number of input bytes per chunk to be divisible by two. Note that internally nvCOMPDx does not distinguish between different 16-bit floating-point formats and treats them equally.

The compatible data type - algorithm pairs are listed in the table below:

Algorithm	Data type(s)
`nvcompdx::algorithm::ans`	`nvcompdx::datatype::uint8` and `nvcompdx::datatype::float16`
`nvcompdx::algorithm::lz4`	`nvcompdx::datatype::uint8`, `nvcompdx::datatype::uint16`, and `nvcompdx::datatype::uint32`

Maximum Uncompressed Chunk Size Operator#

nvcompdx::MaxUncompChunkSize<size_t ChunkSize>()

Sets the maximum uncompressed chunk size in bytes that the compressor is expected to receive. Passing an input chunk to the compressor that is larger than the specified ChunkSize in this description operator results in undefined behavior.

Depending on the data type and algorithm, the supported ranges vary as follows:

Algorithm	Data type	Minimum value (bytes)	Maximum value (bytes)
`nvcompdx::algorithm::ans`	`nvcompdx::datatype::uint8`	1	262,144
`nvcompdx::algorithm::ans`	`nvcompdx::datatype::float16`	2	262,144
`nvcompdx::algorithm::lz4`	`nvcompdx::datatype::uint8`	1	16,777,216
`nvcompdx::algorithm::lz4`	`nvcompdx::datatype::uint16`	2	16,777,216
`nvcompdx::algorithm::lz4`	`nvcompdx::datatype::uint32`	4	16,777,216

The user is free to choose any value within the supported range, but the generally recommended values are:

16,384 bytes, or 16 KiB
32,768 bytes, or 32 KiB
65,536 bytes, or 64 KiB

SM Operator#

nvcompdx::SM<unsigned int CC>()

Sets the target architecture CC for the underlying compressor and decompressor to use. Generally, it is calculated as 100 * CC major + 10 * CC minor, where the abbreviation CC stands for Compute Capability.

Supported architectures by nvCOMPDx are:

Volta: 700 and 720 (sm_70, sm_72)
Turing: 750 (sm_75)
Ampere: 800, 860 and 870 (sm_80, sm_86, sm_87)
Ada: 890 (sm_89)
Hopper: 900 (sm_90)
Blackwell: 1000, 1010, 1100, 1030, 1200, 1210 (sm_100, sm_101, sm_110, sm_103, sm_120, sm_121)

More information can be found on the official NVIDIA website regarding compute capabilities.

Warning

Starting with nvCOMPDx 0.1.0, support for NVIDIA Xavier Tegra SoC (SM<720> or sm_72) is deprecated. Additionally, support for architectures sm_87, sm_103, and sm_121 is experimental in this release.

Note

When compiling for XYa or XYf compute capability, use XY0 in the SM operator (see also CUDA C++ Programming Guide: Feature Availability).

Note

If the created descriptor type (that includes the SM<CC> operator) is passed to a CUDA kernel using a template parameter, adding the compile-time macro NVCOMPDX_SKIP_IF_NOT_APPLICABLE() to the first line of the CUDA kernel is recommended. This practice avoids kernel compilation for unintended device architectures, thereby reducing compilation time and the final binary size.

#include <nvcompdx.hpp>

template<typename COMP>
__global__ void user_kernel(...) {
  NVCOMPDX_SKIP_IF_NOT_APPLICABLE(COMP);

  // rest of the kernel code ...
}