ANS Examples#

Location	Description
`04_ans_gpu/ans_gpu_compression_decompression.cu`	Block-level GPU ANS compression and decompression
`04_ans_gpu/ans_gpu_decompression_reduction.cu`	Fused block-level GPU ANS decompression

The ANS examples, available in the example/nvcompdx/04_ans_gpu folder of the nvCOMPDx package, demonstrate how to use nvCOMPDx with ANS compression.

The first example, ans_gpu_compression_decompression.cu, demonstrates how to use the block-level API of nvCOMPDx with the ANS algorithm. The example takes user-provided input data, creates a batch of chunks, and performs ANS compression followed by ANS decompression. The example concludes with a verification that the input data and the output data are identical. Both compression and decompression are performed with multiple warps within a thread block, but the number of warps per thread block does not need to be identical between compression and decompression. Alternatively, one could also use nvCOMP’s low-level API to perform one side of the operation.

Additionally, ans_gpu_decompression_reduction.cu demonstrates how nvCOMPDx may be used to fuse multiple complex operations into a single kernel, thereby reducing the number of global-memory accesses, fusing resources, and potentially expanding compiler optimization opportunities. This particular example fuses the decompression of a compressed buffer of integers with its subsequent thread block reduction. The compressed buffer is also generated with nvCOMPDx beforehand.