nvJPEG2000 Documentation

Introduction

The nvJPEG2000 library accelerates the decoding of JPEG 2000 images on NVIDIA GPUs. The library is built on the CUDA platform and is supported on Pascal+ GPU architectures.

Note

Throughout this document, the terms “CPU” and “Host” are used synonymously. Similarly, the terms “GPU” and “Device” are synonymous.

nvJPEG2000 Decoder

The library utilizes both CPU and GPU for decoding. Tier 2 decode stage(First stage of decode. Refer to the JPEG 2000 specification for details.) is run on the CPU. All other stages of the decoding process are offloaded to the GPU.

The nvJPEG2000 decoder supports the following subset of JPEG 2000 Part 1:

  • Up to 16 bits per component

  • No of components (channels): 1 or 3

  • Reversible(5-3) and Irreversible(9-7) wavelet transforms

  • Multiple tiles per image. Each tile can have a single tile partition

  • Single quality layer per image

  • All the component dimensions should be the same (444 subsampling only)

  • LRCP (Layer Resolution Component Position) and RLCP progression orders are supported

  • Image and tile start offsets should be 0

  • jp2 file format and jpeg2000 codestream are supported

Prerequisites

  • CUDA Toolkit version 11.0 and above

  • CUDA Driver version r450 and above

Platforms Supported

  • Linux versions:

Architecture

Distribution Information

Name

Version

Kernel

GCC

GLIBC

x86_64

RHEL/CentOS

7.8

3.10.0

4.8.5

2.17

8.2

4.18

8.3.1

2.28

Ubuntu

20.04.1

5.4.0

9.3.0

2.32

18.04.5

4.15.0

8.2.0

2.27

16.04.7

4.5.0

5.4.0

2.23

OpenSUSE Leap

15.2

4.12.14

7.4.0

2.26-lp151

SUSE SLES

15.2

4.12.14

7.3.1

2.26

  • Windows versions:

    • Windows 10 and Windows Server 2019

    • Support added from version 0.1.0 onwards

Thread Safety

Not all nvJPEG2000 types are thread safe. The following should be instantiated separately for each thread: nvjpeg2kDecodeState_t and nvjpeg2kStream_t

Quick Start Guide

This section will explain how to use decoder API in a few quick steps. The API details will be covered in the next section.

JPEG2000 Decode

The library expects the bitstream to be on host memory and the decoded output will be written to device memory.

Note

nvJPEG2000 Decode sample can be found under this repository: https://github.com/NVIDIA/CUDALibrarySamples/tree/master/nvJPEG2000

  1. The first step is to initialize the handles used in the decoder workflow:

  • nvjpeg2kStream_t - is used to parse the bitstream and store the bitstream metadata

  • nvjpeg2kHandle_t - is the nvjpeg2k library handle

  • nvjpeg2kDecodeState_t - is used to store the work buffers required for decode

nvjpeg2kHandle_t nvjpeg2k_handle;
nvjpeg2kStream_t nvjpeg2k_stream;
nvjpeg2kDecodeState_t decode_state;

nvjpeg2kCreateSimple(&nvjpeg2k_handle);
nvjpeg2kDecodeStateCreate(&decode_state);
nvjpeg2kStreamCreate(&nvjpeg2k_stream);
  1. Read the jpeg2k bitstream file and store it on the host buffer. The library supports the .jp2 files and jpeg2000 codestreams.

  2. Use the nvjpeg2kStreamParse API to parse the bitstream.

size_t length;
unsigned char *bitstream_buffer;  // host or pinned memory
// read the bitstream and store it in bitstream_buffer;

// content of bitstream buffer should not be overwritten until the decoding is complete
nvjpeg2kStreamParse(nvjpeg2k_handle, bitstream_buffer, length, 0, 0, nvjpeg2k_stream);

4. Extract the image dimensions for each component and allocate output memory on the device as shown in the below snippet. The decoded output is stored in nvjpeg2kImage_t. This data structure has the ability to handle data 8 and 16 bit precision outputs. The below snippet demonstrates initialization of nvjpeg2kImage_t for an 8 bit 3 channel image.

#define NUM_COMPONENTS 3
// extract image info
nvjpeg2kImageInfo_t image_info;
nvjpeg2kStreamGetImageInfo(nvjpeg2k_stream, &image_info);

// assuming the decoding of images with 8 bit precision, and 3 components

nvjpeg2kImageComponentInfo_t image_comp_info[NUM_COMPONENTS];

for (int c = 0; c < image_info.num_components; c++)
{
    nvjpeg2kStreamGetImageComponentInfo(nvjpeg2k_stream, &image_comp_info[c], c);
}

unsigned char *decode_output[NUM_COMPONENTS];
size_t*pitch_in_bytes[NUM_COMPONENTS];

nvjpeg2kImage_t output_image;
for (int c = 0; c < NUM_COMPONENTS; c++)
{
    cudaMallocPitch(&decode_output[c], &pitch_in_bytes[c], image_comp_info[c].comp_width, image_comp_info[c].comp_height );
}

output_image.pixel_data = decode_output;
output_image.pixel_type = NVJPEG2K_UINT8;
output_image.pitch_in_bytes = pitch_in_bytes;
  1. Call nvjpeg2kDecode(). One of the parameters of this function is cudaStream_t. If a stream identifier is passed then the API will use it to issue the asychronous cuda calls. cudaDeviceSynchronize() is required to complete the decoding process since nvjpeg2kDecode is asychronous with respect to the host.

nvjpeg2kDecode(nvjpeg2k_handle, decode_state, nvjpeg2k_stream, &output_image, 0); // 0 corresponds to cudaStream_t
cudaDeviceSynchronize()
  1. Go to step 2 to decode another image. Once the decoding is completed release the nvjpeg2k library resources by calling the corresponding destroy APIs.

Type Declarations

API Return Status Codes

The return codes of the nvJPEG2000 APIs are listed below:

typedef enum
{
    NVJPEG2K_STATUS_SUCCESS                       = 0,
    NVJPEG2K_STATUS_NOT_INITIALIZED               = 1,
    NVJPEG2K_STATUS_INVALID_PARAMETER             = 2,
    NVJPEG2K_STATUS_BAD_JPEG                      = 3,
    NVJPEG2K_STATUS_JPEG_NOT_SUPPORTED            = 4,
    NVJPEG2K_STATUS_ALLOCATOR_FAILURE             = 5,
    NVJPEG2K_STATUS_EXECUTION_FAILED              = 6,
    NVJPEG2K_STATUS_ARCH_MISMATCH                 = 7,
    NVJPEG2K_STATUS_INTERNAL_ERROR                = 8,
    NVJPEG2K_STATUS_IMPLEMENTATION_NOT_SUPPORTED  = 9,
} nvjpeg2kStatus_t;

Description the Return Codes

Return Code

Description

NVJPEG2K_STATUS_SUCCESS (0)

The API call has finished successfully. Note that many of the calls are asynchronous and some of the errors may be seen only after synchronization.

NVJPEG2K_STATUS_NOT_INITIALIZED (1)

The library handle was not initialized.

NVJPEG2K_STATUS_INVALID_PARAMETER (2)

Wrong parameter was passed. For example, a null pointer as input data, or an invalid enum value

NVJPEG2K_STATUS_BAD_JPEG (3)

Cannot parse the JPEG2000 stream. Likely due to a corruption that cannot be handled

NVJPEG2K_STATUS_JPEG_NOT_SUPPORTED (4)

Attempting to decode a JPEG2000 stream that is not supported by the nvJPEG2000 library.

NVJPEG2K_STATUS_ALLOCATOR_FAILURE (5)

The user-provided allocator functions, for either memory allocation or for releasing the memory, returned a non-zero code.

NVJPEG2K_STATUS_EXECUTION_FAILED (6)

Error during the execution of the device tasks.

NVJPEG2K_STATUS_ARCH_MISMATCH (7)

The device capabilities are not enough for the set of input parameters provided.

NVJPEG2K_STATUS_INTERNAL_ERROR (8)

Unknown error occured in the library.

NVJPEG2K_STATUS_IMPLEMENTATION_NOT_SUPPORTED (9)

API is not supported by the backend.

Device Allocator Interface

typedef int (*nvjpeg2kDeviceMalloc)(void**, size_t);
typedef int (*nvjpeg2kDeviceFree)(void*);
typedef struct
{
    nvjpeg2kDeviceMalloc device_malloc;
    nvjpeg2kDeviceFree device_free;
} nvjpeg2kDeviceAllocator_t;

When the nvjpeg2kDeviceAllocator_t *allocator parameter in the nvjpeg2kCreate() function is set as a pointer to the above nvjpeg2kDeviceAllocator_t structure, then this structure is used for allocating and releasing the device memory. The function prototypes for the memory allocation and memory freeing functions are similar to the cudaMalloc() and cudaFree() functions. They should return 0 in case of success, and non-zero otherwise.

However, if the nvjpeg2kDeviceAllocator_t *allocator parameter in the nvjpeg2kCreate() function is set to NULL, then the default memory allocation functions cudaMalloc() and cudaFree() will be used. When using nvjpeg2kCreateSimple() function to create library handle the default device memory allocator will be used.

Pinned Allocator Interface

typedef int (*nvjpeg2kPinnedMalloc)(void**, size_t, unsigned int flags);
typedef int (*nvjpeg2kPinnedFree)(void*)
typedef struct
{
    nvjpeg2kPinnedMalloc pinned_malloc;
    nvjpeg2kPinnedFree   pinned_free;
} nvjpeg2kPinnedAllocator_t;

When the nvjpegPinnedAllocator_t *allocator parameter in the nvjpeg2kCreate() function is set as a pointer to the above nvjpegPinnedAllocator_t structure, then this structure will be used for allocating and releasing host pinned memory for copying data to/from device. The function prototypes for the memory allocation and memory freeing functions are similar to cudaHostAlloc() and cudaFreeHost() functions. They will return 0 in case of success, and non-zero otherwise.

However, if the nvjpeg2kPinnedAllocator_t *allocator parameter in the nvjpeg2kCreate() function is set to NULL, then the default memory allocation functions cudaHostAlloc() and cudaFreeHost() will be used. When using nvjpegCreateSimple() function to create library handle, the default host pinned memory allocator will be used.

Library Backend

typedef enum
{
    NVJPEG2K_BACKEND_DEFAULT = 0
} nvjpeg2kBackend_t;

The nvjpeg2kBackend_t enum allows the user the option to select a different internal implementation. A single implementation is currently supported. Additional implementations may be added in the future.

Component Information

typedef struct
{
    uint32_t component_width;
    uint32_t component_height;
    uint8_t  precision;
    uint8_t  sgn;
} nvjpeg2kImageComponentInfo_t;

nvjpeg2kImageComponentInfo_t is used to retrieve component level information. This information can be used for allocating output buffers.

Image Information

typedef struct
{
    uint32_t image_width;
    uint32_t image_height;
    uint32_t tile_width;
    uint32_t tile_height;
    uint32_t num_tiles_x;
    uint32_t num_tiles_y;
    uint32_t num_components;
} nvjpeg2kImageInfo_t;

nvjpeg2kImageInfo_t is used to retrieve image information which can be used to allocate output buffers.

Image Type

typedef enum
{
    NVJPEG2K_UINT8 = 0,
    NVJPEG2K_UINT16 = 1
} nvjpeg2kImageType_t;

nvjpeg2kImageType_t describes the pixel data types supported by nvJPEG2000.

Image Data

typedef struct
{
    void **pixel_data;
    size_t *pitch_in_bytes;
    nvjpeg2kImageType_t pixel_type;
    uint32_t num_components;
} nvjpeg2kImage_t;

nvjpeg2kImage_t serves as an image container. It contains an array of void pointers. Each pointer corresponds to a component in the nvjpeg2k bitstream There is another corresponding array which defines the pitch of each component. pixel_type determines the data type of pixel_data. See Image Type for supported types.

Library Handle

struct nvjpeg2kHandle;
typedef struct nvjpeg2kHandle* nvjpeg2kHandle_t;

This handle should be instantiated prior to using the any of the APIs. It is thread safe, and can be used by multiple threads simultaneously.

Decoder State

struct nvjpeg2kDecodeState;
typedef struct nvjpeg2kDecodeState* nvjpeg2kDecodeState_t;

The nvjpeg2kDecodeState_t handle stores intermediate decode information. This handle can be reused when decoding multiple images. User has to ensure that a stream or device synchronize CUDA call is made between the decoding of two images.

Bitstream Handle

struct nvjpeg2kStream;
typedef struct nvjpeg2kStream* nvjpeg2kStream_t;

This handle is used for parsing the bitstream. Bitstream metadata can be extracted by using the APIs defined in Parser API Reference.

API Reference

Helper API Reference

nvjpeg2kGetCudartProperty()

Gets the numeric value for the major version, minor version, or the patch level of the CUDA toolkit that was used to build nvJPEG2000 library

Signature:

nvjpeg2kStatus_t NVJPEG2KAPI nvjpeg2kGetCudartProperty(libraryPropertyType type, int *value);

Parameters:

Parameter

Input/Output

Memory

Description

libraryPropertyType type

Input

Host

One of the supported libraryPropertyType values, that is, MAJOR_VERSION, MINOR_VERSION or PATCH_LEVEL

int *value

Output

Host

The numeric value corresponding to the specific libraryPropertyType requested.

nvjpeg2kGetProperty()

Gets the numeric value for the major or minor version, or the patch level, of the nvJPEG2000 library.

Signature:

nvjpeg2kStatus_t NVJPEG2KAPI nvjpeg2kGetProperty(libraryPropertyType type, int *value);

Parameters:

Parameter

Input/Output

Memory

Description

libraryPropertyType type

Input

Host

One of the supported libraryPropertyType values, that is, MAJOR_VERSION, MINOR_VERSION or PATCH_LEVEL

int *value

Output

Host

The numeric value corresponding to the specific libraryPropertyType requested.

nvjpeg2kCreateSimple()

Creates an instance of the library handle with default backend and memory allocators.

Signature:

nvjpeg2kStatus_t nvjpeg2kCreateSimple(nvjpeg2kHandle_t *handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kHandle_t *handle

Input/Output

Host

nvjpeg2k library handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kCreate()

Creates an instance of the library using the input arguments. User has flexibility to choose the backend implementation and provide allocators.

Signature:

nvjpeg2kStatus_t nvjpeg2kCreate(
        nvjpeg2kBackend_t backend,
        nvjpeg2kDeviceAllocator_t *device_allocator,
        nvjpeg2kPinnedAllocator_t *pinned_allocator,
        nvjpeg2kHandle_t *handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kBackend_t backend

Input

Host

Backend parameter

nvjpeg2kDeviceAllocator_t *device_allocator

Input

Host

Device allocator. cudaMalloc and cudaFree are used if set to NULL. See Device Allocator Interface

nvjpeg2kPinnedAllocator_t *pinned_allocator

Input

Host

Pinnned allocator. cudaHostAlloc and cudaHost are used is set to NULL. See Pinned Allocator Interface

nvjpeg2kHandle_t *handle

Input/Output

Host

nvjpeg2k library handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kDestroy()

Releases the nvjpeg2k library handle.

Signature:

nvjpeg2kStatus_t nvjpeg2kDestroy(nvjpeg2kHandle_t handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kHandle_t handle

Input

Host

nvjpeg2k library handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kDecodeStateCreate()

Creates an instance of the Decode State.

Signature:

nvjpeg2kStatus_t nvjpeg2kDecodeStateCreate(
        nvjpeg2kHandle_t handle,
        nvjpeg2kDecodeState_t *decode_state);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kHandle_t handle

Input

Host

nvjpeg2k library handle

nvjpeg2kDecodeState_t *decode_state

Input/Output

Host

decode state handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kDecodeStateDestroy()

Releases the Decode State handle.

Signature:

nvjpeg2kStatus_t nvjpeg2kDecodeStateDestroy(nvjpeg2kDecodeState_t decode_state);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kDecodeState_t decode_state

Input

Host

decode state handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kStreamCreate()

Creates an instance of the bitstream handle.

Signature:

nvjpeg2kStatus_t nvjpeg2kStreamCreate(nvjpeg2kStream_t *stream_handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kStream_t *stream_handle

Input/Output

Host

nvjpeg2k bitstream handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kStreamDestroy()

Releases the bitstream handle.

Signature:

nvjpeg2kStatus_t nvjpeg2kStreamDestroy(nvjpeg2kStream_t stream_handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kStream_t *stream_handle

Input

Host

nvjpeg2k bitstream handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

Parser API Reference

nvjpeg2kStreamParse()

This function is the first step in the decoding of a JPEG2000 bitstream. It accepts the bitstream buffer on host memory as input and parses the JPEG2000 header information. The parsed information is stored in the nvjpeg2kStream_t handle and can be retrieved by the APIs documented in this section.

Signature:

nvjpeg2kStatus_t nvjpeg2kStreamParse(nvjpeg2kHandle_t handle,
        const unsigned char *data,
        size_t length,
        int save_metadata,
        int save_stream,
        nvjpeg2kStream_t *stream_handle);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kHandle_t handle

Input

Host

library handle

const unsigned char *data

Input

Host

bitstream buffer

size_t length

Input

Host

bitstream size in bytes

int save_metadata

Input

Host

Set to 0. Added for future use

int save_stream

Input

Host

Set to 0. Added for future use

nvjpeg2kStream_t *stream_handle

Input

Host

bitstream handle

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kStreamGetImageInfo()

Retrieves the image information defined in nvjpeg2kImageInfo_t. This information is useful in allocating output buffers on device memory.

Signature:

nvjpeg2kStatus_t nvjpeg2kStreamGetImageInfo(nvjpeg2kStream_t stream_handle,
        nvjpeg2kImageInfo_t* image_info);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kStream_t *stream_handle

Input

Host

bitstream handle

nvjpeg2kImageInfo_t* image_info

Input/Output

Host

Pointer to nvjpeg2kImageInfo_t

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

nvjpeg2kStreamGetImageComponentInfo()

Retrieves the component level information defined in nvjpeg2kImageComponentInfo_t This information can be used in allocating output buffers on device memory. Component level information is useful when the dimensions vary across components.

Signature:

nvjpeg2kStatus_t nvjpeg2kStreamGetImageComponentInfo(nvjpeg2kStream_t stream_handle,
        nvjpeg2kImageComponentInfo_t* component_info,
        uint32_t component_id);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kStream_t stream_handle

Input

Host

bitstream handle

nvjpeg2kImageComponentInfo_t* component_info

Input/Output

Host

Pointer to nvjpeg2kImageInfo_t

uint32_t component_id

Input

Host

Component index

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes

Decode API Reference

nvjpeg2kDecode()

Decodes a single image, and writes it to the output device buffers. The function is synchronous with respect to the host. All GPU tasks will be submitted to the provided stream. Prior to calling this function user must parse the bitstream using nvjpeg2kStreamParse() such that the bitstream information is stored in jpeg2k_stream.

Signature:

nvjpeg2kStatus_t nvjpeg2kDecode(nvjpeg2kHandle_t handle,
        nvjpeg2kDecodeState_t decode_state,
        nvjpeg2kStream_t jpeg2k_stream,
        nvjpeg2kImage_t* decode_output,
        cudaStream_t stream);

Parameters:

Parameter

Input/Output

Memory

Description

nvjpeg2kHandle_t handle

Input

Host

library handle

nvjpeg2kDecodeState_t decode_state

Input

Host

decode state handle

nvjpeg2kStream_t jpeg2k_stream

Input

Host

nvjpeg2k bitstream handle

nvjpeg2kImage_t* decode_output

Input/Output

Host

pointer to an instance of component_info

cudaStream_t stream

Input

Host

Decode output struct. The struct should be on host memory. The image component pointers should point to device memory. See Image Data

Returns:

nvjpeg2kStatus_t - An error code as specified in API Return Status Codes