NVIDIA nvCOMPDx Documentation#

The nvCOMP Device Extensions (nvCOMPDx) library enables selected compressors and decompressors from nvCOMP to be constructed inside CUDA kernels. Fusing these routines with other operations can decrease latency, potentially reduce global memory footprint, and improve the overall performance of your application. The library is distributed in LTO-IR format, providing excellent performance for users.

nvCOMPDx is part of the MathDx package which also includes cuBLASDx for basic linear algebra subroutines (BLAS), cuFFTDx for FFT calculations, cuRANDDx for random number generation, and cuSolverDx for selected dense matrix factorization and solve routines. All these device extension libraries are designed to work together. When using multiple device extension libraries in a single project, they should all come from the same MathDx release.

Note

The nvCOMPDx library is meant to supersede nvCOMP’s Device API that existed since nvCOMP v3.0.0 (July, 2023) up to and including nvCOMP v4.2.0 (February, 2025).

The new library offers enhanced functionality that can be unlocked with minimal code changes for existing nvCOMP Device API users. We strongly recommend migrating to nvCOMPDx to take advantage of these improvements.

Highlights#

The nvCOMPDx library currently provides:

2 compressor algorithms with various data type support:
- LZ4: A general-purpose no-entropy byte-level compressor.
- ANS: A proprietary entropy encoder based on asymmetric numeral systems.
Thread block and warp-level API design.
Unrestrictive thread block size support: nvCOMPDx offers execution operators that let you use thread blocks larger than required by the (de)compression task.
Interoperability with nvCOMP: chunks compressed by nvCOMPDx are decompressible by nvCOMP, and vice versa.
Ability to fuse nvCOMPDx functions with other arbitrary operations to save global memory round trips.
Support for multiple GPU architectures from Volta and up, including Blackwell.